Collecting personally identifiable information (PII) on data subjects
has become big business. Data brokers and data processors are part
of a multi-billion-dollar industry that profits from collecting, buying,
and selling consumer data. Yet there is little transparency in the
data collection industry which makes it difficult to understand what
types of data are being collected, used, and sold, and thus the risk
to individual data subjects. In this study, we examine a large textual
dataset of privacy policies from 1997-2019 in order to investigate
the data collection activities of data brokers and data processors. We
also develop an original lexicon of PII-related terms representing
PII data types curated from legislative texts. This mesoscale analysis
looks at privacy policies over time on the word, topic, and network
levels to understand the stability, complexity, and sensitivity of
privacy policies over time. We find that (1) privacy legislation may
be correlated with changes in stability and turbulence of PII data
types in privacy policies; (2) the complexity of privacy policies
decreases over time and becomes more regularized; (3) sensitivity
rises over time and shows spikes that appear to be correlated with
events when new privacy legislation is introduced.