Summer Workshop 2012: Speakers

Here you can find the list of speakers for the Summer Workshop 2012. More detailed information about the talks will be available shortly.

Crowdsourcing: Achieving Data Quality with Impefect Humans
Panos Ipeirotis

Panos IpeirotisAbstract. Crowdsourcing is a great tool to collect data and support machine learning-it is the ultimate form of outsourcing. But crowdsourcing introduces budget and quality challenges that must be addressed to realize its benefits. In this talk, I will discuss the use of crowdsourcing for building robust machine learning models quickly and under budget constraints. I’ll operate under the realistic assumption that we are processing imperfect labels that reflect random and systematic error on the part of human workers. To illustrate, I’ll use classification problems that arise in online advertising. Finally, I’ll discuss our latest results showing that mice and Mechanical Turk workers are not that different after all.

Speaker’s Short Bio. Panos Ipeirotis is an Associate Professor and George A. Kellner Faculty Fellow at the Department of Information, Operations, and Management Sciences at Leonard N. Stern School of Business of New York University. He is also the Chief Scientist at Tagasauris, and in 2012-2013 serves as “academic-in-residence” at oDesk Research. His recent research interests focus on crowdsourcing and on mining user-generated content on the Internet. He received his Ph.D. degree in Computer Science from Columbia University in 2004, working with Prof. Luis Gravano. He has received three “Best Paper” awards (IEEE ICDE 2005, ACM SIGMOD 2006, WWW 2011), two “Best Paper Runner Up” awards (JCDL 2002, ACM KDD 2008), and is also a recipient of a CAREER award from the National Science Foundation and several other industry grants. In his spare time, he writes about crowdsourcing and various other topics on his blog, “A Computer Scientist in a Business School,” an activity that seems to generate more interest and recognition than any of the above.

 

Big Data, Causal Modeling, and Robust Estimation
Sherri Rose

Sherri RoseAbstract. “Big data” is the new buzz word in the world of information collection and analysis. The data we collect continue to grow in size, in part due to rapidly expanding technology in different sectors that allows us to measure and store drastically more information. Techniques from computer science, engineering, and statistics have been brought together in collaborative efforts to get the most out of this data. This talk will give an overview of the relationship between causal modeling and estimation, including machine learning, and the many applications of these techniques.

Speaker’s Short Bio. Sherri Rose, PhD is an NSF Mathematical Sciences Postdoctoral Research Fellow in the Department of Biostatistics at the Johns Hopkins Bloomberg School of Public Health. Her research focuses on causal inference, double robust estimation, prediction, and machine learning. Dr. Rose recently coauthored the book “Targeted Learning: Causal Inference for Observational and Experimental Data” with Mark van der Laan for the Springer Series in Statistics. She received her Ph.D. in Biostatistics from the University of California, Berkeley in 2011, where she was the recipient of the Evelyn Fix Memorial Prize. Other honors include the Gertrude M. Cox Scholarship in Statistics from the American Statistical Association and the Recent Alumni Achievement Award from The George Washington University.

 

Privacy in Web Search Query Log Mining
Rosie Jones

Abstract. Web search engines have changed our lives – enabling instant access to information about subjects that are both deeply important to us, as well as passing whims. The search engines that provide answers to our search queries also log those queries, in order to improve their algorithms. Academic research on search queries has shown that they can provide valuable information on diverse topics including word and phrase similarity, topical seasonality and may even have potential for sociology, as well as providing a barometer of the popularity of many subjects. At the same time, individuals are rightly concerned about what the consequences of accidental leaking or deliberate sharing of this information may mean for their privacy. In this talk I will cover the applications which have benefited from mining query logs, the risks that privacy can be breached by sharing query logs, and current algorithms for mining logs in a way to prevent privacy breaches.

Speaker’s Short Bio. Rosie Jones is Director of Computational Advertising at Akamai Technologies. Her research interests include computational advertising, web search, geographic information retrieval, and natural language processing. She previously worked at Yahoo! on query log analysis and privacy. She received her PhD from the School of Computer Science at Carnegie Mellon University. She has served on the Senior PC for SIGIR 2007-2009, and is a Senior Member of the ACM.

 

BIG DATA AND PRIVACY: A PERFECT STORM? Achieving Trust in Quantifying our Environmentalk Title
Stefaan Verhulst

Abstract. Big data offers huge potential to create economic value and impact the way organizations make decisions and operate. At the same time, quantifying our environment poses various policy challenges for governing the collection, processing, handling and utilization of personal information, particularly in the area of privacy. Existing privacy frameworks are increasingly challenged by the broad based use of new technologies that enable Big Data. Advances in re-identification allow for people to become identified more easier than once assumed. In addition, the Fair Information Practice Principles (FIPPs), developed alongside the floppy disk, are antiquated in the era of Big Data. Further, the U.S. lacks a baseline privacy law. Because privacy laws in the U.S. have developed by sector, much of the personal information captured, used, distributed and stored on the Internet is not subject to Federal statutory protection. The ability of policy makers and technologists to establish an appropriate policy and technology framework to govern privacy will be imperative to realizing the full potential of Big Data. In this talk, I will look at the privacy challenges posed by Big Data, using a staged approach, and examine a handful of new frameworks that have been proposed to maximize the value of Big Data while protecting privacy. Although the proposed frameworks vary, most acknowledge that trust will be essential as we quantify our environment further.

Speaker’s Short Bio. Stefaan G. Verhulst is the Chief of Research at the Markle Foundation and Senior Research Fellow at the Center for Global Communications Studies, Annenberg School for Communications, University of Pennsylvania. He is also an Adjunct Professor in the Department of Culture and Communications at New York University, and Senior Research Fellow for the Center for Media and Communications Studies, Central European University in Budapest. Previously, he was the Co-Founder and Co-Director, with Professor Monroe Price, of the Programme in Comparative Media Law and Policy (PCMLP) at Oxford University, as well as Senior Research Fellow at the Centre for Socio Legal Studies. In that capacity, he was appointed the Socio-Legal Research Fellow at Wolfson College at Oxford. Verhulst was the UNESCO Chairholder in Communications Law and Policy for the UK, a former lecturer on Communications Law and Policy issues in Belgium, and Founder and Co-Director of the International Media and Info-Comms Policy and Law Studies (IMPS) at the School of Law, University of Glasgow. Verhulst has served as consultant to various international and national organizations, including the Council of Europe, European Commission, UNESCO, World Bank, UNDP, USAID, and DFID.

Verhulst is the author and co-author of several books and numerous articles and chapters. He is the Founder and Editor of the International Journal of Communications Law and Policy, and the Communications Law in Transition Newsletter.

Back to Workshop Page