Enforcing k-anonymity in Web Mail Auditing (English)

WSDM, International Conference on Web Search & Data Mining, 9

We study the problem of k-anonymization of mail messages in the realistic scenario of auditing mail traffic in a major commercial Web mail service. Mail auditing is necessary in various Web mail debugging and quality assurance activities, such as anti-spam or the qualitative evaluation of novel mail features. It is conducted by trained professionals, often referred to as "auditors", who are shown messages that could expose personally identifiable information. We address here the challenge of k-anonymizing such messages, focusing on machine generated mail messages that represent more than 90% of today's mail traffic. We introduce a novel message signature Mail-Hash, specifically tailored to identifying structurally similar messages, which allows us to put such messages in a same equivalence class. We then define a process that generates, for each class, masked mail samples that can be shown to auditors, while guaranteeing the k-anonymity of users. The productivity of auditors is measured by the amount of non-hidden mail content they can see every day, while considering normal working conditions, which set a limit to the number of mail samples they can review. In addition, we consider k-anonymity over time since, by definition of k-anonymity, every new release places additional constraints on the assignment of samples. We describe in details the results we obtained over actual Yahoo mail traffic, and thus demonstrate that our methods are feasible at Web mail scale. Given the constantly growing concern of users over their email being scanned by others, we argue that it is critical to devise such algorithms that guarantee k-anonymity, and implement associated processes in order to restore the trust of mail users.

Document information


Table of contents conference proceedings

The table of contents of the conference proceedings is generated automatically, so it can be incomplete, although all articles are available in the TIB.

1
Large-Scale Deep Learning For Building Intelligent Computer Systems
Dean, Jeffrey | 2016
13
Cross-modality Consistent Regression for Joint Visual-Textual Sentiment Analysis of Social Multimedia
You, Quanzeng / Luo, Jiebo / Jin, Hailin / Yang, Jianchao | 2016
33
Quantifying Controversy in Social Media
Garimella, Kiran / De Francisci Morales, Gianmarco / Gionis, Aristides / Mathioudakis, Michael | 2016
53
Exploiting New Sentiment-Based Meta-level Features for Effective Sentiment Analysis
Canuto, Sergio / Goncalves, Marcos Andre / Benevenuto, Fabricio | 2016
63
Mobile App Tagging
Chen, Ning / Hoi, Steven C.H. / Li, Shaohua / Xiao, Xiaokui | 2016
83
On the Efficiency of the Information Networks in Social Media
Babaei, Mahmoudreza / Grabowicz, Przemyslaw / Gummadi, Krishna P. / Gomez-Rodriguez, Manuel / Valera, Isabel | 2016
93
Modeling and Predicting Learning Behavior in MOOCs
Qiu, Jiezhong / Tang, Jie / Liu, Tracy-Xiao / Gong, Jie / Zhang, Chenhui / Zhang, Qian / Xue, Yufei | 2016
103
Beyond Ranking. Optimizing Whole-Page Presentation
Wang, Yue / Yin, Dawei / Jie, Luo / Wang, Pengyuan / Yamada, Makoto / Chang, Yi / Mei, Qiaozhu | 2016
113
Understanding User Attention and Engagement in Online News Reading
Lagun, Dmitry / Lalmas, Mounia | 2016
123
Publication Date Prediction through Reverse Engineering of the Web
Ostroumova Prokhorenkova, Liudmila / Prokhorenkov, Petr / Samosvat, Egor / Serdyukov, Pavel | 2016
133
To Suggest, or Not to Suggest for Queries with Diverse Intents. Optimizing Search Result Presentation
Kato, Makoto P. / Tanaka, Katsumi | 2016
143
Term-by-Term Query Auto-Completion for Mobile Search
Vargas, Saul / Blanco, Roi / Mika, Peter | 2016
163
Personalized PageRank Estimation and Search. A Bidirectional Approach
Lofgren, Peter / Banerjee, Siddhartha / Goel, Ashish | 2016
173
Your Cart tells You. Inferring Demographic Attributes from Purchase Data
Wang, Pengfei / Guo, Jiafeng / Lan, Yanyan / Xu, Jun / Cheng, Xueqi | 2016
193
Hierarchical Semi-supervised Classification with Incomplete Class Hierarchies
Dalvi, Bhavana / Mishra, Aditya / Cohen, William W. | 2016
203
Is Mail The Next Frontier In Search And Data Mining?
Maarek, Yoelle | 2016
205
Portrait of an Online Shopper. Understanding and Predicting Consumer Behavior
Kooti, Farshad / Lerman, Kristina / Aiello, Luca Maria / Grbovic, Mihajlo / Djuric, Nemanja / Radosavljevic, Vladan | 2016
215
Evolution of Privacy Loss in Wikipedia
Rizoiu, Marian-Andrei / Xie, Lexing / Caetano, Tiberio / Cebrian, Manuel | 2016
227
Modeling Intransitivity in Matchup and Comparison Data
Chen, Shuo / Joachims, Thorsten | 2016
237
Crowdsourcing High Quality Labels with a Tight Budget
Li, Qi / Ma, Fenglong / Gao, Jing / Su, Lu / Quinn, Christopher J. | 2016
267
Quality Management in Crowdsourcing using Gold Judges Behavior
Kazai, Gabriella / Zitouni, Imed | 2016
287
A Semantic Graph based Topic Model for Question Retrieval in Community Question Answering
Chen, Long / Joemon, M. Jose / Yu, Haitao / Yuan, Fajie / Zhang, Dell | 2016
297
Modeling Check-in Preferences with Multidimensional Knowledge: A Minimax Entropy Approach
Wang, Jingjing / Li, Min / Han, Jiawei / Wang, Xiaolong | 2016
307
You've got Mail, and Here is What you Could do With It! Analyzing and Predicting Actions on Email Messages
Di Castro, Dotan / Lewin-Eytan, Liane / Karnin, Zohar / Maarek, Yoelle | 2016
327
Enforcing k-anonymity in Web Mail Auditing
Di Castro, Dotan / Lewin-Eytan, Liane / Maarek, Yoelle / Wolff, Ran / Zohar, Eyal | 2016
337
An Information-Theoretic Approach to Individual Sequential Data Sanitization
Bonomi, Luca / Fan, Liyue / Jin, Hongxia | 2016
347
Improving IP Geolocation using Query Logs
Dan, Ovidiu / Parikh, Vaibhav / Davison, Brian D. | 2016
357
Geographic Segmentation via Latent Poisson Factor Model
Yu, Rose / Gelfand, Andrew / Rajan, Suju / Shahabi, Cyrus / Liu, Yan | 2016
367
Scaling up Link Prediction with Ensembles
Duan, Liang / Aggarwal, Charu / Ma, Shuai / Hu, Renjun / Huai, Jinpeng | 2016
377
DiFacto. Distributed Factorization Machines
Li, Mu / Liu, Ziqi / Smola, Alexander J. / Wang, Yu-Xiang | 2016
387
Distributed Balanced Partitioning via Linear Embedding
Aydin, Kevin / Bateni, Mohammad Hossein / Mirrokni, Vahab | 2016
397
Kangaroo. Workload-Aware Processing of Range Data and Range Queries in Hadoop
Aly, Ahmed M. / Elmeleegy, Hazem / Qi, Yan / Aref, Walid | 2016
407
Feedback Control of Real-Time Display Advertising
Zhang, Weinan / Rong, Yifei / Wang, Jun / Zhu, Tianchi / Wang, Xiaofan | 2016
427
Multi-view Machines
Cao, Bokai / Zhou, Hucheng / Li, Guoqiang / Yu, Philip S. | 2016
437
Transductive Classification on Heterogeneous Information Networks with Edge Betweenness-based Normalization
Phiradet Bangcharoensap / Tsuyoshi, Murata / Kobayashi, Hayato / Shimizu, Nobuyuki | 2016
447
The Troll-Trust Model for Ranking in Signed Networks
Wu, Zhaoming / Aggarwal, Charu C. / Sun, Jimeng | 2016
457
Multileave Gradient Descent for Fast Online Learning to Rank
Schuth, Anne / Oosterhuis, Harrie / Whiteson, Shimon / de Rijke, Maarten | 2016
467
AMiner. Toward Understanding Big Scholar Data
Tang, Jie | 2016
469
Serving a Billion Personalized News Feeds
Backstrom, Lars | 2016
471
The Predictive Power of Massive Data about our Fine-Grained Behavior
Provost, Foster | 2016
473
Information Evolution in Social Networks
Adamic, Lada A. / Lento, Thomas M. / Adar, Eytan / Ng, Pauline C. | 2016
493
Querying and Tracking Influencers in Social Streams
Subbian, Karthik / Aggarwal, Charu C. / Srivastava, Jaideep | 2016
503
Centrality-Aware Link Recommendations
Parotsidis, Nikos / Pitoura, Evaggelia / Tsaparas, Panayiotis | 2016
513
Relational Learning with Social Status Analysis
Wu, Liang / Hu, Xia / Liu, Huan | 2016
523
Equality and Social Mobility in Twitter Discussion Groups
Ellis, Katherine / Goldszmidt, Moises / Lanckriet, Gert / Mishra, Nina / Reingold, Omer | 2016
553
Towards Modelling Language Innovation Acceptance in Online Social Networks
Kershaw, Daniel / Rowe, Matthew / Stacey, Patrick | 2016
563
Discriminative Learning of Infection Models
Rosenfeld, Nir / Nitzan, Mor / Globerson, Amir | 2016
573
Representation Learning for Information Diffusion through Social Networks: an Embedded Cascade Model
Bourigault, Simon / Lamprier, Sylvain / Gallinari, Patrick | 2016
583
Ensemble Models for Data-driven Prediction of Malware Infections
Kan, Chanhyun / Park, Noseong / Prakash, B. Aditya / Serra, Edoardo / Subrahmanian, V.S. | 2016
615
Improving Website Hyperlink Structure Using Server Logs
Paranjape, Ashwin / West, Robert / Zia, Leila / Leskovec, Jure | 2016
635
Semantic Documents Relatedness using Concept Graph Representation
Ni, Yuan / Mass, Yosi / Xu, Qiong Kai / Sheinwald, Dafna / Cao, Shao Sheng / Cao, Feng / Zhu, Hui Jia | 2016
655
Extracting Search Query Patterns via the Pairwise Coupled Topic Model
Konishi, Takuya / Ohwa, Takuya / Fujita, Sumio / Ikeda, Kazushi / Hayashi, Kohei | 2016
665
The Past and Future of Systems for Current Events
Naaman, Mor | 2016
667
Barbara Made the News. Mining the Behavior of Crowds for Time-Aware Learning to Rank
Martins, Flavio / Magalhaes, Joao / Callan, Jamie | 2016
677
Wiggins. Detecting Valuable Information in Dynamic Networks Using Limited Resources
Mahmoody, Ahmad / Riondato, Matteo / Upfal, Eli | 2016
687
Understanding Offline Political Systems by Mining Online Political Data
Lazer, David / Tsur, Oren / Eliassi-Rad, Tina | 2016
693
TargetAd2016. 2nd International Workshop on Ad Targeting at Scale
Grbovic, Mihajlo / Djuric, Nemanja / Radosavljevic, Vladan | 2016
699
Mining Complaints to Improve a Product. a Study about Problem Phrase Extraction from User Reviews
Tutubalina, Elena | 2016
701
Web-scale Multimedia Search for Internet Video Content
Jiang, Lu | 2016
707
Understanding Diffusion Processes. Inference and Theory
He, Xinran | 2016
709
E-commerce Product Recommendation by Personalized Promotion and Total Surplus Maximization
Zhao, Qi | 2016
713
User Modeling in Large Social Networks
Dong, Yuxiao | 2016
717
Temporal Formation and Evolution of Online Communities
Fani, Hossein | 2016
719
Mining the Web for Intelligent Problem Solving for Programmers
Rong, Xin | 2016
721
Optimizing Search Interactions within Professional Social Networks
Spirin, Nikita V. | 2016

Similar titles