Here you can find the list of speakers for the Summer Workshop 2012. More detailed information about the talks will be available shortly.
Crowdsourcing: Achieving Data Quality with Impefect Humans
Abstract. Crowdsourcing is a great tool to collect data and support machine learning-it is the ultimate form of outsourcing. But crowdsourcing introduces budget and quality challenges that must be addressed to realize its benefits. In this talk, I will discuss the use of crowdsourcing for building robust machine learning models quickly and under budget constraints. I’ll operate under the realistic assumption that we are processing imperfect labels that reflect random and systematic error on the part of human workers. To illustrate, I’ll use classification problems that arise in online advertising. Finally, I’ll discuss our latest results showing that mice and Mechanical Turk workers are not that different after all.
Speaker’s Short Bio. Panos Ipeirotis is an Associate Professor and George A. Kellner Faculty Fellow at the Department of Information, Operations, and Management Sciences at Leonard N. Stern School of Business of New York University. He is also the Chief Scientist at Tagasauris, and in 2012-2013 serves as “academic-in-residence” at oDesk Research. His recent research interests focus on crowdsourcing and on mining user-generated content on the Internet. He received his Ph.D. degree in Computer Science from Columbia University in 2004, working with Prof. Luis Gravano. He has received three “Best Paper” awards (IEEE ICDE 2005, ACM SIGMOD 2006, WWW 2011), two “Best Paper Runner Up” awards (JCDL 2002, ACM KDD 2008), and is also a recipient of a CAREER award from the National Science Foundation and several other industry grants. In his spare time, he writes about crowdsourcing and various other topics on his blog, “A Computer Scientist in a Business School,” an activity that seems to generate more interest and recognition than any of the above.
Big Data, Causal Modeling, and Robust Estimation
Abstract. “Big data” is the new buzz word in the world of information collection and analysis. The data we collect continue to grow in size, in part due to rapidly expanding technology in different sectors that allows us to measure and store drastically more information. Techniques from computer science, engineering, and statistics have been brought together in collaborative efforts to get the most out of this data. This talk will give an overview of the relationship between causal modeling and estimation, including machine learning, and the many applications of these techniques.
Speaker’s Short Bio. Sherri Rose, PhD is an NSF Mathematical Sciences Postdoctoral Research Fellow in the Department of Biostatistics at the Johns Hopkins Bloomberg School of Public Health. Her research focuses on causal inference, double robust estimation, prediction, and machine learning. Dr. Rose recently coauthored the book “Targeted Learning: Causal Inference for Observational and Experimental Data” with Mark van der Laan for the Springer Series in Statistics. She received her Ph.D. in Biostatistics from the University of California, Berkeley in 2011, where she was the recipient of the Evelyn Fix Memorial Prize. Other honors include the Gertrude M. Cox Scholarship in Statistics from the American Statistical Association and the Recent Alumni Achievement Award from The George Washington University.
Privacy in Web Search Query Log Mining
Abstract. Web search engines have changed our lives – enabling instant access to information about subjects that are both deeply important to us, as well as passing whims. The search engines that provide answers to our search queries also log those queries, in order to improve their algorithms. Academic research on search queries has shown that they can provide valuable information on diverse topics including word and phrase similarity, topical seasonality and may even have potential for sociology, as well as providing a barometer of the popularity of many subjects. At the same time, individuals are rightly concerned about what the consequences of accidental leaking or deliberate sharing of this information may mean for their privacy. In this talk I will cover the applications which have benefited from mining query logs, the risks that privacy can be breached by sharing query logs, and current algorithms for mining logs in a way to prevent privacy breaches.
Speaker’s Short Bio. Rosie Jones is Director of Computational Advertising at Akamai Technologies. Her research interests include computational advertising, web search, geographic information retrieval, and natural language processing. She previously worked at Yahoo! on query log analysis and privacy. She received her PhD from the School of Computer Science at Carnegie Mellon University. She has served on the Senior PC for SIGIR 2007-2009, and is a Senior Member of the ACM.
BIG DATA AND PRIVACY: A PERFECT STORM? Achieving Trust in Quantifying our Environmentalk Title