Encryption for the masses? An analysis of PGP key usage

. Despite the rise of alternatives, email remains integral to technology-mediated communication. To protect email privacy the encryption software Pretty Good Privacy (PGP) has been considered the irst choice for individuals since 1991. However, there is little scholarly insight into the character istics and motivations for the people using PGP. We seek to shed light on social aspects of PGP: who is using PGP for encrypted email communication, how and why? By understanding those using the technology, questions on the motivations, usability, and the political dimension of communication encryption can be contextualized and cautiously generalized to provide input for the design of pri-vacy-enhancing technologies. We have greatly extended the scale and scope of existing research by conducting a PGP key analysis on 4.27 million PGP public keys complemented by a survey illed out by former and current PGP users (N = 3,727). We show that a relatively small homogeneous population of mainly western, technically skilled, and moderately politically active males is using PGP for privacy self-management. Additionally, indings from existing research identifying poor usability and a lack of understanding of the underlying mechanisms of PGP can be conirmed.


Introduction
he means and tools of communication are changing over time. Despite the rising popularity of instant messaging on mobile devices within the last decade, email continues to play a signiicant role as a digital communication technology in everyday life. As of 2016 using email was the most frequently performed online activity world-By providing extensive empirical evidence, we greatly extend the scale and scope of existing research. Our work is based on an analysis of 4.27 million PGP public keys and complemented by a survey illed out by 3,727 current and former PGP users. Our work suggests that a relatively small homogeneous population of mainly western, technically skilled, and moderately politically active males is using PGP for encrypted email. Additionally, indings from existing research that identify poor usability and a lack of understanding of the underlying mechanisms of PGP can be conirmed. Our work contributes to discussions in computer-mediated communication and human-computer interaction and aims at supporting those who design technological solutions for individual privacy protection by providing insights into the hurdles for privacy technology adoption.
he study is divided into ive parts: (i), the background on PGP, (ii) a review of related literature, (iii) an explanation of the methodology, (iv) an analysis and (v) a discussion of the results.

Background
PGP follows the community-led Internet standard OpenPGP (Callas et al. 2007) and is implemented in numerous sotware, such as the widely known GNU Privacy Guard (GPG). To use PGP for communications, both the sender and receiver need to self-generate a private and a public key. One encrypts an email with one's previously shared public key. Only with the corresponding private key can the receiver decrypt and read the message.
By providing digital signatures, PGP allows users to detect if data has been altered ater being signed and to determine if the data was signed by the person who claimed to do so. Data can be both encrypted and signed, or either of both independently. For instance, open source sotware packages are oten signed to guarantee data integrity and authentication.
Moreover, PGP contains a mechanism to express trust on other people's keys by explicitly signing them. his understanding of trust relects the veriication of a key rather than trust in a person. For example, if key A trusts key B, key C trusting key A could then decide to trust key B as well. A trust path from C to A to B would emerge. As a whole, signatures and trust paths are referred to as the 'web of trust' .
Public keys may be shared in person or uploaded to PGP public key servers. Once uploaded, keys cannot be deleted to prevent adversaries from uploading forged keys. To signal the validity of a key, users may mark keys or signatures as 'revoked' or with an expiration date ater which it is marked as invalid and should no longer be used (Garinkel 1995;Zimmermann 1995

Related Work
Literature touching upon the social aspects of PGP can broadly be divided into two themes: research on the web of trust focusing on the collective level, and research on usability focusing on the individual level. Studies on the web of trust mostly focus on a subset of keys, the strongly connected component that is the largest connected group of signed keys. Within it, a 'small-world phenomenon' characterized by short trust path lengths could be detected (̌apkun, Buttyan, Hubaux, 2002;Ulrich et al. 2011). Warren, Wilkinson andWarnecke (2007) found that the overall connectivity of the web of trust increased over time, while social distance between subgroups slightly decreased. hese changes can be traced back to real life events where people can sign each other's keys, such as Linux conferences. his level of analysis reveals that PGP is at least being used by a connected technical audience interacting with each other through signatures.
With regards to the usability of PGP, Whitten and Tygar (1999) found that it was too complicated for most users. In a usability test, most participants failed to perform basic tasks such as sending encrypted emails and made dangerous errors such as sharing their private keys. In a replication study, Sheng et al. (2006) found that the updated user interface still impairs usability and criticized it for providing little feedback to users. A more recent study concluded that modern PGP clients are still not suiciently intuitive or usable (Ruoti et al. 2016). In addition to identifying the user interface as a source of problems, Garinkel and Miller (2005) argue that trust is interpreted diferently by users, and most PGP implementations do not efectively communicate the PGP concept of trust in a key instead of a person. Lastly it has been argued that diferent types of users, such as ephemeral or habitual users, have diferent needs that need to be relected in the design of email encryption sotware (Gaw, Felten, Fernandez-Kelly 2006).
Besides usability, existing research provides little insight into how and why PGP is (not) being used for protecting communication. However, it provides grounds for assuming that PGP is used by experienced users and is thus not suitable for the average user.

Methodology
We conducted a macro-level analysis of all globally published PGP public keys and subsequently launched a complementary survey approved by our Institutional Review Board. A publicly available data set containing 4,270,992 keys public keys and their signatures as of 13/05/2016 was obtained from pgp.key-server.io. We disregarded keys with implausible creation dates before 1991 and ater May 2016 as well as keys having no or malformed email addresses, leaving us with 4,113,983 keys for analysis (see Appendix B). English (see Appendix A). Due to diferent survey lows a participant was asked a maximum of 19 questions and took approximately 10 minutes to complete. Informed consent and certiication of legal age were required. he option to withdraw at any time without penalty was ofered. Participants could select a 'Prefer not to say' option for all questions. No inancial incentives for participation were awarded.
Almost all survey questions were developed from scratch, including questions on demographics that had to be adapted to a global scale of an unknown population. We conducted two cognitive interviews with PGP users and launched a pilot survey (N = 52) to test and reine the questionnaire. Some could not load the Qualtrics survey design due to privacy preserving web browser plugins, resulting in Likert-scale questions becoming unreadable. Hence, all such items were converted or removed.
Based on the response rate of the pilot survey, we randomly selected 200,000 email addresses to obtain a representative sample. To minimize the probability of sampling the same participant more than once, we combined distinct keys with the same email address (1,062,880 keys), leaving us with 3,051,103 to sample from. To reach ex-users as well, we did not encrypt our invitation email. Unfortunately we could not sign our invitation due to the technical limitation of the survey platform. A total of 59,254 invitation emails could not be delivered (29%). Eventually 3,787 surveys were completed, resulting in a 2.6% response rate ater removing ineligible participants who did not meet the legal age requirement or had never used PGP. his response rate is not unusual for surveys where participants are invited over email (Schonlau, Fricker Jr., Elliot, 2002).
It is worth noting that we received about 370 response emails and three phone calls. Respondents were positive about the project, asking for more information, asking why the invitation was neither PGP encrypted nor signed, trying to verify our identity, or complaining about the receipt of unsolicited invitation email. One formal complaint was iled and resolved with the Institutional Review Board. his unusual reaction can clearly be traced back to the scale, but also to the privacy and security aware population that is more suspicious of unexpected emails (Wright, Marett 2010).
On the participant side, we acknowledge several limitations. Firstly, we could only contact those who published their public keys online, therefore our indings may not be fully applicable to those who only share their keys in person. Secondly, a non-response bias due to suspicions can be attributed to technical aspects of the invitation email. For instance, the invitation email contained an individual survey link that looked suspect to potential respondents. Moreover, we assume that those who did not feel safe to share information about themselves did not participate. It is possible that this might apply more oten to people living outside Europe or North America. hirdly, the telescoping error, which is the "tendency of respondents to report events as occurring earlier or later than they occurred" (Eisenhower, Mathiowetz, Morganstein 2004, p. 135), might have afected long-term users. Finally, especially with questions regarding opinions about and motivations to use PGP, respondents might have sufered a recall decay; the inability to properly recall relevant events and Pobrane z czasopisma Mediatizations Studies http://mediatization.umcs.pl Data: 06/09/2019 16:37:23 U M C S associated memories. To reduce the chance of detecting non-existent associations resulting from this bias in the analysis stage, signiicance levels are set to 99% (α = 0.01) instead of the social science convention of 95% (α = 0.05). Nevertheless, readers are urged to interpret the following results with caution.

PGP key analysis
As it is impossible to answer 'why' PGP is being used by conducting a key analysis, we focused on the 'who' and 'how' by looking at the distribution of keys over time, irst names by gender, a geographical approximation based on email addresses, as well as on the use of keys and signatures. Figure 1 depicts the distribution of key creations per year. A large number of keys, between 250,000 to 350,000 keys per year, were created between 1997 and 2001. hereater, key creations became less frequent, with about 100,000 to 150,000 keys per year, before taking of again in 2013. Users can self-generate more than one key, hence the total number of keys does not translate into the number of users. As we merged keys by email addresses to identify users with several keys, we observed that there was a large number of keys with distinct email addresses. If we assumed that each email address belonged to a diferent user, this number would indicate a large number of diferent PGP users having one key each (Median = 1.0, Mean = 1.3, SD = 1.4, N = 3,065,428). When it comes to all keys that were neither expired nor revoked as of May 2016, a total of 3.67 million keys could be considered as active. While these igures indicated a large number of individual users, in the context of the 27 years of PGP existence, the numbers were actually quite low, especially given that there were 3.39 billion Internet users in 2016 when we conducted the study (ITU 2016).
With regards to signatures, 4.97 million were created in total. However, out of all keys, only about 600,000 (12%) had been signed, and only 400,000 (8%) had signed other keys, resulting in a huge disproportion of signing and signed keys. his indicates that the 'small-world phenomenon' among signed keys detected in earlier research might very well be observed amongst those keys which signed and were signed. However, due to the lack of signatures the vast majority of PGP users were not part of the web of trust, but disconnected from all other keys.
We continued our analysis by looking at the gender distribution of PGP users. While acknowledging that there were more than two distinct genders, an approximation of irst names by these binary categories should provide an estimation of the gender balance of PGP users. he irst names of all User IDs were analyzed using the names corpus (Kantrowitz, Ross 1994). First names such as Christian or Kim were categorized both as male and female. To mitigate bias, two searches were run, irst looking up male and then female names, and vice versa. he mean of both runs was skewed towards male users: 83.4% had male irst names, while 16.4% had female irst names (N = 1,874,652). his was only an imprecise approximation, as not all keys contained a irst name, and in other cases it could not be detected as pseudonyms were oten used (Orman 2015).
hereater we were interested in inding out where users came from. We examined email addresses as a rough estimation of location; however, this did not yield a suficiently useful approximation of where users reside. Email addresses are registered with an email provider that is addressable by a domain name, such as gmail.com. However, top-level domains are a weak approximation of location, as global email services mostly ofer .com addresses, and it is oten possible to register country-code top-level domains even as a non-resident.

PGP survey analysis
To validate and back-up the indings from the key analysis, we turn to the complementing survey. All sample demographics are given in Table 1. In terms of gender, we could see a large number of males (94.9%), at over 10 percentage points higher than the irst name analysis. he vast majority of respondents currently resided in Europe We also learned that most respondents were under 55 years of age, with about twothirds between 25 and 44 years of age. he sample was highly educated, with 87.2% having university education or equivalent. he primary work sectors were related to IT (58.4%), followed by science and education (19.7%). he survey showed that 75.7% of respondents were actively using PGP, while 24.3% had given up. he respondents stated having four keys on average (Median = 4.0, Mean = 5.2, SD = 18.6, N = 3,414). his difered signiicantly from the mean of one key per person indicated by the key analysis (t(3413) = 13.58, p = 0.00, Power = 1.0), providing strong support for the assumption that most users had more than one PGP key, and countering our key analysis result.
About three quarters of the respondents used their key for personal reasons (77.1%), followed by 39.7% professional, and 8.5% for other uses (e.g. sotware development). Email encryption/decryption is an activity carried out most by respondents with PGP (90.7%), followed by signing and verifying other data, such as downloads (65.3%), as well as data encryption and decryption (55.5%). About half (50.7%) engaged in signing or verifying other people's keys. his did not relect the 15% of signed keys seen in the key analysis. Rather, this observation indicated that a large number of users veriied the authenticity of keys manually, but seldom issued signatures. When using PGP for email encryption, the vast majority did so with only some of their contacts (84.5%), 3.3% for most, and a minority (0.5%) for all.
In explaining why they chose PGP technology in the irst place, the majority of respondents indicated that they merely wanted to try PGP out of curiosity (71.2%), followed by those who were responding to government's activities such as surveillance (31.2%). Notably, only 29.1% of users started with PGP as a means to contact one or more people conidentially. An additional 27.9% began using PGP spurred on by the example of their peers. For 19.2% of respondents PGP was required for work.
he vast majority of former users gave up using PGP because they had no one else to communicate with using encryption (72.4%), had no need to encrypt information (25.3%), felt it lacked an intuitive sotware interface (23.8%), or had no PGP availability on their platforms (such as mobile) (23.6%). he lack of a communication partner indicated the missing network efect of PGP. his is especially consequential given that email encryption is the most reported use.
To understand why only about 15% of keys participate in the web of trust, we asked about the motivations not to sign or verify keys. As seen before, the most prominent reason is that respondents have nobody else to communicate with using PGP (47.2%). About a quarter of respondents (22.8%) had no need for signatures, and 16.7% did not fully understand how signatures worked. Our results only partially support indings from the literature that the main barrier to participation in the web of trust is the lack of understanding of the underlying trust model by PGP users.
Due to the earlier-mentioned political use, it is worth investigating the political activities of PGP users to understand whether email encryption might be more prevalent in political contexts (Figure 2). To identify underlying factors for political activity that we conceptualized as civic engagement, we conducted a Principal Component Analysis on the political variables 'engagement with political advocacy' and 'political activities in the last year' . Regarding engagement with advocacy organizations, respondents were mostly engaged with technology-oriented organizations such as hackerspaces. his, again, speaks to a technically-skilled PGP usership. Apart from technology-oriented associations, only a minority of respondents were active in advocacy. Political activities in the previous year were more evenly balanced than engagement with advocacy organizations. Respondents were less formally organized, but seem nonetheless moderately politically active.
Turning to the analysis, the 'None of the above' options were excluded from the beginning, as they already indicated non-activity. Ater eliminating three variables that Pobrane z czasopisma Mediatizations Studies http://mediatization.umcs.pl Data: 06/09/2019 16:37:23 U M C S did not load above 0.3 on any of the components (Blank, Groselj 2014), nine variables were used for the Principal Component Analysis. hroughout, varimax rotations and the Kaiser's criterion were applied to identify two components (see Table 2).
he irst component clusters more online-based activities, such as forwarding or sharing political news or funny political content, as well as commenting on political issues online. Signing a petition is the lowest loading on this factor, but in this case might also refer to online petitions. From this component, we constructed an online political activity scale, ranging from 0 (low activity) to 4 (high activity), with 2.5 being the cutof for low and high (Median = 1.0, Mean = 1.7, SD = 1.4, N = 3,285). here was a signiicant diference in the scores for online political activity being ≤ 2.5, the mean value of the scale (t(3284) = −11.39, p = 0.00, Power = 1.0). his conirms low online political activity among PGP users.
he second component comprises advocacy and oline political activity, which is why membership in diferent political organizations and taking part in a public demonstration or march are clustered together. Notably, being part of a technology-oriented association did not load high on either component, and can thus not be treated as political activity per se. he second component allowed for an oline political activity scale ranging from 0 (low activity) to 5 (high activity), with 0-2 as low levels of activity and 3-5 as high levels of activity (Median = 0.0, Mean = 0.6, SD = 0.9, N = 3,285). As with online political activity there was also a signiicant diference in the scores for ≤ 3, the mean value of the scale (t(3284) = −0.0012, p = 0.00, Power = 1.0). his similarly conirms low oline political activity among PGP users. With regard to demographics, there are no signiicant diferences for both types of political activity. hese results suggest that, contrary to our assumption, the population of PGP users is not very politically active overall, neither online nor oline.

Discussion
Returning to the literature on the web of trust, we need to stress that the vast majority of key holders do not actively participate in the web of trust due to the lack of communication partners using PGP. However, for those who do, the 'small-world phenomenon' found in the past can now be explained from a social perspective. Only a minority of users actively engage in signing keys, as at least for some there are barriers to understanding the concept. As Warren, Wilkinson and Warnecke (2007) argued, signing keys may oten take place at social events, such as Linux conferences. Such gatherings could most likely be classiied as meetings of people with high technical skills, inasmuch as the participants have other people to communicate with, use PGP for sotware development, and, additionally, meet at least the skills and probably other demographic criteria observed above.
In a global context, PGP users are absolutely unrepresentative. Globally, 49.6% of humans are women, 61% of the population is aged between 15-59 (as compared to 91.3% of PGP users being less than 55 years old), and merely 14.7% of the world population live in Europe or North America compared to 90.0% of the surveyed PGP users. he level of education and primary work sector of respondents is equally unrepresentative (UN 2017;World Bank 2016). With regard to Internet users, the sample population is skewed a little less. In 2014, Internet penetration rates were highest in Europe (79.1%) and the Commonwealth of Independent States (66.6%), followed by 35% in the Americas (ITU 2016). As a result, the sample is disproportionately skewed.
It is worth noting that in some regions, such as China, instant messaging is the prevalent form of technology-mediated communication, rather than email (Horowitz 2017). Moreover, we could only ind limited PGP sotware for Asian languages. In addition to our English-only questionnaire, this might partially explain why we did not see a large Asian participation in encrypted email communication.
Placed in a historic context, PGP exports to outside the US was initially classiied as unlicensed munitions exports. In consequence, the sotware's developer was investigated by the US government, calling attention to PGP and the regulation of cryptography in general. Together with other issues related to encryption, individuals and industry challenged the US government to remove any export restrictions on cryptography in the 1990s, a period known as the Crypto Wars (Kehl, Wilson, Bankston 2015). More recently, in light of global government surveillance revelations (Greenwald 2014), this inherent political dimension to PGP might have re-emerged. hese two political periods or events correlate with PGP key generations, peaking at the end of the 1990s and again ater 2013 (see Figure 1). Yet, our results suggest that PGP is not used by highly politically active people, at least when conceptualizing political activity by civic engagement. But given that PGP "empowers people to take their privacy into their own hands" (Zimmermann 1999), using the sotware to protect one's privacy and to challenge governments might itself be seen as an inherent political act to reclaim privacy, even by those who are not otherwise intensely politically active. his contrast can be conceptualized with Kubtischko's (2017) distinction of acting with media and acting on media. Primarily, we focused on how and in which political contexts people tend to use PGP (acting with media). Yet, the engagement by tech-savvy people with a similar social network using encrypted communication to deal with and evade surveillance itself could be understood as political agency (acting on media).
Using PGP to encrypt online communications could thus been seen as an act of "privacy literacy" (Debatin 2011), where users inform themselves on how to protect their communications and take action using a privacy-enhancing technology in order to escape the "surveillance-industrial complex" (Trottier, Fuchs 2015). Interestingly, the self-perception of the use of PGP being a political act is not pronounced among PGP users as only one third of respondents claimed that they did so in response to government actions -as compared to more than two third claiming that they got started with PGP because they wanted to try it out.
Using encryption for mediated communication can be perceived as form of socio-cultural practice that presupposes a collectively generated social, cultural, legal, and technological infrastructure that can be relied on to exercise privacy self-protection practices. Empirical evidence shows that this practice is not wide-spread. Matzner et al. (2016) take a step back and ask whether individuals should be responsible for privacy self-protection at all. hey argue that as long as data protection "is not considered a collective, profoundly political endeavor", privacy self-management is an "ill-fated practice" (p. 303).
In fact, most respondents claimed that they do not use PGP because they have no one to communicate with. his indirectly may conirm research on the user interface, inding that it impairs usability. If PGP is really hard to use, users might ind only few communication partners for encrypted email communication. In other words, if privacy protection is not designed for, or activated by default to reach a large user base, the consequence is that its impact will be limited.
We found 4.11 million PGP keys published on the key servers containing an email address. Of the 200,000 respondents invited, 29% of survey invitation emails could not be delivered. Extrapolated, at least 1.2 million keys may not be in use any more due to email mortality. Given the fact that users own on average four keys, there might only be about 730,000 users of PGP. Ater subtracting the 25% of ex-users, PGP might possibly draw on as few as 550,000 active users.
While in nearly three decades PGP may have reached less than one million users and has therefore not achieved encryption for the masses, the mobile communication application WhatsApp reached a billion people in only a few years (Metz 2016). In contrast to emails, commercial products and thus market solutions like WhatsApp build encryption into their instant messaging products and enable it by default. In the context of the emerging use of mobile devices and instant messaging, unprotected communication gradually ceases to exist, providing users with private communications by default.

Conclusion
PGP was developed to safeguard privacy in digital communication. Yet, it requires a speciic technical skill set and social environment. We found that PGP is used by a well-educated, technologically curious and skilled homogenous male minority. Moreover, we estimate that there might be as little as 550,000 active PGP users. Based on the demographics and key usage, it can be concluded that, practically speaking, PGP does not provide encryption for the masses. To reach the masses, evidence suggests that it should be easy to understand and enabled by default. Otherwise, there is the risk of inequalities in privacy protection between those who can actively self-manage their privacy and those who cannot. When shiting from email to other kinds of technology-mediated communication such as mobile instant messaging, the gap regarding the protection of communication privacy is already partially decreasing.
Turning to the political dimension of technology-mediated communication, PGP users might not be overly politically active when taking civic engagement as a basis -even though one third started using PGP as a reaction to government actions such as surveillance. In this sense, deliberately using sotware to protect one's digital communication to oppose governments might itself be a political act to reclaim privacy. In contrast, there are more user-friendly corporate-owned market solutions like WhatsApp providing encryption by default. However, it can be questioned whether using such out of the box solutions is a comparable political act in the sense of privacy self-management. hus, we suggest further research to investigate how the practice of privacy protection is being re-negotiated by shiting from an act of privacy self-protection requiring a certain skill set to a market-provided solution, and consequently how such developments contribute to privacy protection from a broader socio-political point of view.