Class Year

2018

Document Type

Thesis

Honors Designation

High Honors

Department

Computer Science

Primary Advisor

Michael Hay

Abstract

The utility of machine learning is rising, coming from a growing wealth of data and problems that are becoming harder to solve analytically. With these changes there is also the need for interpretable machine learning in order for users to understand how a machine learning algorithm comes to a specific output. Bayesian Rule Lists, an interpretable machine learning algorithm, offers an advanced accuracy to interpretabilty trade off when compared to other interpretable machine learning algorithms. Additionally, with the amount of data collected today, there is a lot of potentially sensitive data that we can learn from such as medical and criminal records. However, to do so, we must guarantee a degree of privacy on the dataset; differential privacy has become the standard for this private data analysis. In this paper, we propose a differentially private algorithm for Bayesian Rule Lists.

We first break down the original Bayesian Rule List algorithm into three main components: frequent itemset mining, rule list sampling, and point estimate computation. We then perform a literature review to understand these algorithms, and ways to privatize them. There after we computed the necessary sensitivities for all subroutines, and ran experiments on the resulting differentially private algorithm to gauge utility. Results show that the proposed algorithm is able to output rule lists with good accuracy and decent interpretability.

Comments

Code from all experiments may be found at http://git.io/DP-BRL.

DP-SBRL.zip (778 kB)
Source code for experiments

Share

COinS