RAPPOR is a novel privacy technology that allows inferring statistics about populations while preserving the privacy of individual users.
This repository contains simulation and analysis code in Python and R.
For a detailed description of the algorithms, see the paper and links below.
Feel free to send feedback to firstname.lastname@example.org.
Running the Demo
Although the Python and R libraries should be portable to any platform, our end-to-end demo has only been tested on Linux.
If you don't have a Linux box handy, you can view the generated output.
To get your feet wet, install the R dependencies (details below). It should look something like this:
$ R ... > install.packages(c('glmnet', 'optparse', 'ggplot2'))
$ ./demo.sh build # optional speedup, it's OK for now if it fails
This compiles and tests the
fastrand C extension module for Python, which
speeds up the simulation.
$ ./demo.sh run
The demo strings together the Python and R code. It:
- Generates simulated input data with different distributions
- Runs it through the RAPPOR privacy-preserving reporting mechanisms
- Analyzes and plots the aggregated reports against the true input
The output is written to
_tmp/regtest/results.html, and can be opened with a
R analysis (
Demo dependencies (
These are necessary if you want to test changes to the code.
Python client (
- None. You should be able to just import the
- R: tested on R 3.0.
- Python: tested on Python 2.7.
- OS: the shell script tests have been tested on Linux, but may work on Mac/Cygwin. The R and Python code should work on any OS.
To run tests:
$ ./test.sh all
This currently runs Python unit tests, lints Python source files, and runs R unit tests.
rappor.py is a tiny standalone Python file, and you can easily copy it into a
NOTE: Its interface is subject to change. We are in the demo stage now, but if there's demand, we will document and publish the interface.
The R interface is also subject to change.
fastrand C module is optional. It's likely only useful for simulation of
thousands of clients. It doesn't use cryptographically strong randomness, and
thus should not be used in production.
analysis/ R/ # R code for analysis cpp/ # Fast reimplementations of certain analysis # algorithms apps/ # Web apps to help you use RAPPOR (using Shiny) bin/ # Command line tools for analysis. client/ # Client libraries python/ # Python client library rappor.py ... cpp/ # C++ client library encoder.cc ... doc/ # Documentation tests/ # Tools for regression tests compare_dist.R # Test helper for single variable analysis gen_true_values.R # Generate test input make_summary.py # Generate an HTML report for the regtest rappor_sim.py # RAPPOR client simulation regtest_spec.py # Specification of test cases ... build.sh # Build scripts (docs, C extension, etc.) demo.sh # Quick demonstration regtest.sh # End-to-end regression tests, including client # libraries and analysis run.sh # Misc automation setup.sh # Install dependencies (for Linux) test.sh # Test runner
- RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response
- Building a RAPPOR with the Unknown: Privacy-Preserving Learning of Associations and Data Dictionaries
- Google Blog Post about RAPPOR
- RAPPOR implementation in Chrome
- This is a production quality C++ implementation, but it's somewhat tied to Chrome, and doesn't support all privacy parameters (e.g. only a few values of p and q). On the other hand, the code in this repo is not yet production quality, but supports experimentation with different parameters and data sets. Of course, anyone is free to implement RAPPOR independently as well.
- Mailing list: email@example.com