Internet Research Ethics

First published Fri Jun 22, 2012; substantive revision Tue Jan 12, 2021

There is little research that is not impacted in some way on or through the Internet. The Internet, as a field, a tool, and a venue, has specific and far-reaching ethical issues. Internet research ethics is a subdiscipline that fits across many disciplines, ranging from social sciences, arts and humanities, medical/biomedical, and natural sciences. Extant ethical frameworks, including consequentialism, deontology, virtue ethics, and feminist ethics, have contributed to the ways in which ethical issues in Internet research are considered and evaluated.

Conceptually and historically, Internet research ethics is most related to computer and information ethics and includes such ethical issues as participant knowledge and consent, data privacy, security, anonymity and confidentiality, and integrity of data, intellectual property issues, and community, disciplinary, and professional standards or norms. Throughout the Internet’s evolution, there has been continued debate whether there are new ethical dilemmas emerging, or if the existing dilemmas are similar to dilemmas in other research realms (Elgesem 2002; Walther 2002; Ess & AoIR 2002; Marhkam & Buchanan 2012). These debates are similar to philosophical debates in computer and information ethics. For example, many years ago, James Moor (1985) asked “what is special about computers” in order to understand what, if anything, is unique ethically. Reminding us, however, that research itself must be guided by ethical principles, regardless of technological intervention, van Heerden et al. (2020) and Sloan et al. (2020) stress that the “fundamental principles of conducting ethical social research remain the same” (Ess & AoIR 2002; King 1996; Samuel and Buchanan, 2020).

Yet, as the Internet has evolved into a more social and communicative tool and venue, the ethical issues have shifted from purely data-driven to more human-centered. “On-ground” or face-to-face analogies, however, may not be applicable to online research. For example, the concept of the public park has been used as a site where researchers might observe others with little ethical controversy, but online, the concepts of public versus private are much more complex (SACHRP 2013). Thus, some scholars suggest that the specificity of Internet research ethics calls for new regulatory and/or professional and disciplinary guidance. For these reasons, the concept of human subjects research policies and regulation, informs this entry, which will continue discussions around ethical and methodological complexity, including personal identifiability, reputational risk and harm, notions of public space and public text, ownership, and longevity of data as they relate to Internet research. Specifically, the emergence of the social web raises issues around subject or participant recruitment practices, tiered informed consent models, and protection of various expectations and forms of privacy in an ever-increasing world of diffused and ubiquitous technologies. Additional ethical concerns center on issues of anonymity and confidentiality of data in spaces where researchers and their subjects may not fully understand the terms and conditions of those venues or tools, challenges to data integrity as research projects can be outsourced or crowdsourced to online labor marketplaces, and jurisdictional issues as more research is processed, stored, and disseminated via cloud computing or in remote server locales, presenting myriad legal complexities given jurisdictional differences in data laws. Further, the dominance of big data research has continued across research spaces, with the notions of “real-world data” and pervasive computing readily accepted and used in all disciplines. The ease of access and availability to use big data sets in myriad ways has enabled AI (artificial intelligence) and ML (machine learning) to grow as standard tools for researchers.

As a result, researchers using the Internet as a tool for and/or a space of research—and their research ethics boards (REBs), also known as institutional review boards (IRBs) in the United States or human research ethics committees (HRECs) in other countries such as Australia—have been confronted with a series of new ethical questions: What ethical obligations do researchers have to protect the privacy of subjects engaging in activities in “public” Internet spaces? What are such public spaces? Is there any reasonable expectation of privacy in an era of pervasive and ubiquitous surveillance and data tracking? How is confidentiality or anonymity assured online? How is and should informed consent be obtained online? How should research on minors be conducted, and how do you prove a subject is not a minor? Is deception (pretending to be someone you are not, withholding identifiable information, etc.) an acceptable online norm or a harm? How is “harm” possible to someone existing in an online space? How identifiable are individuals in large data sets? Do human subjects protections apply to big data? As more industry-sponsored research takes place, what ethical protections exist outside of current regulatory structures? As laws, such as the EU’s General Data Protection Regulation (GDPR 2016) are enacted, what are the global implications for data privacy and individual rights?

A growing number of scholars have explored these and related questions (see, for example, Bromseth 2002; Bruckman 2006; Buchanan 2004; Buchanan & Ess 2008; Johns, Chen & Hall 2003; Kitchin 2003, 2008; King 1996; Mann 2003; Markham & Baym 2008; McKee & Porter 2009; Thorseth 2003; Ess 2016; Zimmer & Kinder-Kurlanda (eds.) 2017; Samuel & Buchanan, 2020), scholarly associations have drafted ethical guidelines for Internet research (Ess & Association of Internet Researchers 2002; Markham, Buchanan, and AoIR 2012; franzke et al., 2020; Kraut et al. 2004), and non-profit scholarly and scientific agencies such as AAAS (Frankel & Siang 1999) are confronting the myriad of ethical concerns that Internet research poses to researchers and research ethics boards (REBs).

Given that over 50% of the world population uses the Internet, and that 97% of the world population now lives within reach of a mobile cellular signal and 93% within reach of a 3G (or higher) network (International Telecommunications Union, 2019), continued exploration of the ethical issues related to research in this heavily mediated environment is critical.

1. Definitions

The commonly accepted definition of Internet research ethics (IRE) has been used by Buchanan and Ess (2008, 2009), Buchanan (2011), and Ess & Association of Internet Researchers (AoIR) (2002):

IRE is defined as the analysis of ethical issues and application of research ethics principles as they pertain to research conducted on and in the Internet. Internet-based research, broadly defined, is research which utilizes the Internet to collect information through an online tool, such as an online survey; studies about how people use the Internet, e.g., through collecting data and/or examining activities in or on any online environments; and/or, uses of online datasets, databases, or repositories.

These examples were broadened in 2013 by the United States Secretary’s Advisory Committee to the Office for Human Research Protections (SACHRP 2013), and included under the umbrella term Internet Research:

  • Research studying information that is already available on or via the Internet without direct interaction with human subjects (harvesting, mining, profiling, scraping, observation or recording of otherwise-existing data sets, chat room interactions, blogs, social media postings, etc.)
  • Research that uses the Internet as a vehicle for recruiting or interacting, directly or indirectly, with subjects (Self-testing websites, survey tools, Amazon Mechanical Turk, etc.)
  • Research about the Internet itself and its effects (use patterns or effects of social media, search engines, email, etc.; evolution of privacy issues; information contagion; etc.)
  • Research about Internet users: what they do, and how the Internet affects individuals and their behaviors Research that utilizes the Internet as an interventional tool, for example, interventions that influence subjects’ behavior
  • Others (emerging and cross-platform types of research and methods, including m-research (mobile))
  • Recruitment in or through Internet locales or tools, for example social media, push technologies

A critical distinction in the definition of Internet research ethics is that between the Internet as a research tool versus a research venue. The distinction between tool and venue plays out across disciplinary and methodological orientations. As a tool, Internet research is enabled by search engines, data aggregators, digital archives, application programming interfaces (APIs), online survey platforms, and crowdsourcing platforms. Internet-based research venues include such spaces as conversation applications (instant messaging and discussion forums, for example), online multiplayer games, blogs and interactive websites, and social networking platforms.

Another way of conceptualizing the distinction between tool and venue comes from Kitchin (2008), who has referred to a distinction in Internet research using the concepts of “engaged web-based research” versus “non-intrusive web-based research:”

Non-intrusive analyses refer to techniques of data collection that do not interrupt the naturally occurring state of the site or cybercommunity, or interfere with premanufactured text. Conversely, engaged analyses reach into the site or community and thus engage the participants of the web source (2008: 15).

These two constructs provide researchers with a way of recognizing when considering of human subject protections might need to occur. McKee and Porter (2009), as well as Banks and Eble (2007) provide guidance on the continuum of human-subjects research, noting a distinction between person-based versus text-based. For example, McKee and Porter provide a range of research variables (public/private, topic sensitivity, degree of interaction, and subject vulnerability) which are useful in determining where on the continuum of text-based versus how person-based the research is, and whether or not subjects would need to consent to the research (2009: 87–88).

While conceptually useful for determining human subjects participation, the distinction between tool and venue or engaged versus non-intrusive web-based research is increasingly blurring in the face of social media and their third-party applications. Buchanan (2016) has conceptualized three phases or stages of Internet research, and the emergence of social media characterize the second phase, circa 2006–2014. The concept of social media entails

A group of Internet-based applications that build on the ideological and technological foundations of Web 2.0, and that allow the creation and exchange of user-generated content (Kaplan & Haenlein 2010: 61).

A “social network site” is a category of websites with profiles, semi-persistent public commentary on the profile, and a traversable publicly articulated social network displayed in relation to the profile.

This collapse of tool and venue can be traced primarily to the increasing use of third-party sites and applications such as Facebook, Twitter, or any of the myriad online research tools where subject or participant recruitment, data collection, data analysis, and data dissemination can all occur in the same space. With these collapsing boundaries, the terms of “inter-jurisdictional coordination” (Gilbert 2009: 3) are inherently challenging; Gilbert has specifically argued against the terms of use or end-user license agreement stipulations in virtual worlds, noting that such agreements are often “flawed”, as they rely on laws and regulations from a specific locale and attempt to enforce them in a non place-based environment. Nonetheless, researchers now make frequent use of data aggregation tools, scraping data from user profiles or transaction logs, harvesting data from Twitter streams, or storing data on cloud servers such as Dropbox only after agreeing to the terms of service that go along with those sites. The use of such third party applications or tools changes fundamental aspects of research, oftentimes displacing the researcher or research team as the sole owner of their data. These unique characteristics implicate concepts and practicalities of privacy, consent, ownership, and jurisdictional boundaries.

A key moment that typified and called attention to many of these concerns emerged with the 2014 Facebook Emotional Contagion study (Booth, 2014). By virtue of agreeing to Facebook’s Terms of Service, did users consent to participation in research activities? Should there have been a debriefing after the experiment? How thoroughly did a university research ethics board review the study? Should industry-sponsored research undergo internal ethics review? In response to the outcry of the Contagion study, Ok Cupid’s Christian Rudder (2014 [OIR]) defended these sorts of experiments, noting

We noticed recently that people didn’t like it when Facebook “experimented” with their news feed. Even the FTC is getting involved. But guess what, everybody: if you use the Internet, you’re the subject of hundreds of experiments at any given time, on every site. That’s how websites work.

The phenomenon of the social web forces an ongoing negotiation between researchers and their data sources, as seen in the Facebook contagion study and the subsequent reaction to it. Moreover, with the growing use and concentration of mobile devices, the notion of Internet research is expanding with a movement away from a “place-based” Internet to a dispersed reality. Data collection from mobile devices has increased exponentially. For example, mobile devices enable the use of synchronous data collection and dissemination from non-place based environments. Researchers using cloud-enabled applications can send and receive data to and from participants synchronously. The impact of such research possibilities for epidemiological research is staggering for its scientific potential while demanding for the concurrent ethical challenges, as we are seeing with mobile-based COVID-19 research (Drew et al., 2020) and the sampling of subjects' current behaviors and experiences in real-time (Hubach et al., forthcoming). As Internet research has grown from a niche methodology into a nearly ubiquitous and often invisible practice, the traditional concepts of human subjects research require careful consideration.

2. Human Subjects Research

The practical, professional, and theoretical implications of human subjects protections has been covered extensively in scholarly literature, ranging from medical/biomedical to social sciences to computing and technical disciplines (see Beauchamp & Childress 2008; Emanual et al. 2003; PRIM&R et al. 2021; Sieber 1992; Wright 2006). Relevant protections and regulations continue to receive much attention in the face of research ethics violations (see, for example, Skloot 2010, on Henrietta Lacks; the U.S. Government’s admission and apology to the Guatemalan Government for STD testing in the 1940s (BBC 2011); and Gaw & Burns 2011, on how lessons from the past might inform current research ethics and conduct).

The history of human subjects protections (Sparks 2002 [see Other Internet Resources aka OIR]) grew out of atrocities such as Nazi human experimentation during World War II, which resulted in the Nuremberg Code (1947); subsequently followed by the Declaration of Helsinki on Ethical Principles for Medical Research Involving Human Subjects (World Medical Association 1964/2008). Partially in response to the Tuskegee syphilis experiment, an infamous clinical study conducted between 1932 and 1972 by the U.S. Public Health Service studying the natural progression of untreated syphilis in rural African-American men in Alabama under the guise of receiving free health care from the government, the U.S. Department of Health and Human Services put forth a set of basic regulations governing the protection of human subjects (45 C.F.R. § 46) (see the links in the Other Internet Resources section, under Laws and Government Documents). This was later followed by the publication of the “Ethical Principles and Guidelines for the Protection of Human Subjects of Research” by the National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research, known as the Belmont Report (NCPHSBBR 1979). The Belmont Report identifies three fundamental ethical principles for all human subjects research: Respect for Persons, Beneficence, and Justice.

To ensure consistency across federal agencies in the United States context in human subjects protections, in 1991, the Federal Policy for the Protection of Human Subjects, also known as the “Common Rule” was codified; the Revised Common Rule was released in the Federal Register on 19 January 2017, and went into effect 19 July 2018. Similar regulatory frameworks for the protection of human subjects exist across the world, and include, for example, the Canadian Tri-Council, the Australian Research Council, The European Commission, The Research Council of Norway and its National Committee for Research Ethics in the Social Sciences and Humanities (NESH 2006; NESH 2019), and the U.K.’s NHS National Research Ethics Service and the Research Ethics Framework (REF) of the ESRC (Economic and Social Research Council) General Guidelines, and the Forum for Ethical Review Committees in Asia and the Western Pacific (FERCAP).

In the United States, the various regulatory agencies who have signed on to the Common Rule (45 C.F.R. 46 Subpart A) have not issued formal guidance on Internet research (see the links in the Other Internet Resources section, under Laws and Government Documents). The Preamble to the Revised Rule referenced significant changes in the research environment, recognizing a need to broaden the scope of the Rule. However, substantial changes to the actual Rule in regards to Internet research in its broadest context, were minimal.

For example, the Preamble states:

This final rule recognizes that in the past two decades a paradigm shift has occurred in how research is conducted. Evolving technologies—including imaging, mobile technologies, and the growth in computing power—have changed the scale and nature of information collected in many disciplines. Computer scientists, engineers, and social scientists are developing techniques to integrate different types of data so they can be combined, mined, analyzed, and shared. The advent of sophisticated computer software programs, the Internet, and mobile technology has created new areas of research activity, particularly within the social and behavioral sciences (Federal Register 2017 and HHS 2017).

Modest changes to the definition of human subjects included changing “data” to “information” and “biospecimens;” the definition now reads:

  • (1) Human subject means a living individual about whom an investigator (whether professional or student) conducting research:
    • (i) Obtains information or biospecimens through intervention or interaction with the individual, and uses, studies, or analyzes the information or biospecimens; or
    • (ii) Obtains, uses, studies, analyzes, or generates identifiable private information or identifiable biospecimens.
  • (2) Intervention includes both physical procedures by which information or biospecimens are gathered (e.g., venipuncture) and manipulations of the subject or the subject's environment that are performed for research purposes.
  • (3) Interaction includes communication or interpersonal contact between investigator and subject.
  • (4) Private information includes information about behavior that occurs in a context in which an individual can reasonably expect that no observation or recording is taking place, and information that has been provided for specific purposes by an individual and that the individual can reasonably expect will not be made public (e.g., a medical record).
  • (5) Identifiable private information is private information for which the identity of the subject is or may readily be ascertained by the investigator or associated with the information.
  • (6) An identifiable biospecimen is a biospecimen for which the identity of the subject is or may readily be ascertained by the investigator or associated with the biospecimen (45 C.F.R. § 46.102 (2018)).

However, the Revised Rule does have a provision that stands to be of import in regards to Internet research; the Rule calls for implementing departments or agencies to,

  • (i) Upon consultation with appropriate experts (including experts in data matching and re-identification), reexamine the meaning of “identifiable private information”, as defined in paragraph (e)(5) of this section, and “identifiable biospecimen”, as defined in paragraph (e)(6) of this section. This reexamination shall take place within 1 year and regularly thereafter (at least every 4 years). This process will be conducted by collaboration among the Federal departments and agencies implementing this policy. If appropriate and permitted by law, such Federal departments and agencies may alter the interpretation of these terms, including through the use of guidance.
  • (ii) Upon consultation with appropriate experts, assess whether there are analytic technologies or techniques that should be considered by investigators to generate “identifiable private information”, as defined in paragraph (e)(5) of this section, or an “identifiable biospecimen”, as defined in paragraph (e)(6) of this section. This assessment shall take place within 1 year and regularly thereafter (at least every 4 years). This process will be conducted by collaboration among the Federal departments and agencies implementing this policy. Any such technologies or techniques will be included on a list of technologies or techniques that produce identifiable private information or identifiable biospecimens. This list will be published in the Federal Register after notice and an opportunity for public comment. The Secretary, HHS, shall maintain the list on a publicly accessible Web site (45 C.F.R. § 46.102 (2018)).

As of this writing, there has not yet been a reexamination of the concepts of “identifiable private information” or “identifiable biospecimens”. However, as data analytics, AI, and machine learning continue to expose ethical issues in human subjects research, we expect to see engaged discussion at the federal level and amongst research communities (PRIM&R 2021). Those discussions may refer to previous conceptual work by Carpenter and Dittrich (2012) and Aycock et al. (2012) that is concerned with risk and identifiability. Secondary uses of identifiable, private data, for example, may pose downstream harms, or unintentional risks, causing reputational or informational harms. Reexaminations of “identifiable private information” can not occur without serious consideration of risk and “human harming research”. Carpenter and Dittrich (2012) encourage

“Review boards should transition from an informed consent driven review to a risk analysis review that addresses potential harms stemming from research in which a researcher does not directly interact with the at-risk individuals” (p. 4) as “[T]his distance between researcher and affected individual indicates that a paradigm shift is necessary in the research arena. We must transition our idea of research protection from ‘human subjects research’ to ‘human harming research’” (p. 14).[1]

Similarly, Aycock et al. (2012) assert that

Researchers and boards must balance presenting risks related to the specific research with risks related to the technologies in use. With computer security research, major issues around risk arise, for society at large especially. The risk may not seem evident to an individual but in the scope of security research, larger populations may be vulnerable. There is a significant difficulty in quantifying risks and benefits, in the traditional sense of research ethics….An aggregation of surfing behaviors collected by a bot presents greater distance between researcher and respondent than an interview done in a virtual world between avatars. This distance leads us to suggest that computer security research focus less concern around human subjects research in the traditional sense and more concern with human harming research (p. 3, italics original).

These two conceptual notions are relevant for considering emergent forms of identities or personally identifiable information (PII) such as avatars, virtual beings, bots, textual and graphical information. Within the Code of Federal Regulations (45 C.F.R. § 46.102(f) 2009): New forms of representations are considered human subjects if PII about living individuals is obtained. PII can be obtained by researchers through scraping data sources, profiles or avatars, or other pieces of data made available by the platform. Fairfield agrees: “An avatar, for example, does not merely represent a collection of pixels—it represents the identity of the user” (2012: 701).

The multiple academic disciplines already long engaged in human subjects research (medicine, sociology, anthropology, psychology, communication) have established ethical guidelines intended to assist researchers and those charged with ensuring that research on human subjects follows both legal requirements and ethical practices. But with research involving the Internet—where individuals increasingly share personal information on platforms with porous and shifting boundaries, where both the spread and aggregation of data from disparate sources has become the norm, and where web-based services, and their privacy policies and terms of service statements, morph and evolve rapidly—the ethical frameworks and assumptions traditionally used by researchers and REBs are frequently challenged.

Research ethics boards themselves are increasingly challenged with the unique ethical dimensions of internet-based research protocols. In a 2008 survey of U.S. IRBs, less than half of the ethical review boards identified internet-based research was “an area of concern or importance” at that time, and only 6% had guidelines or checklists in place for reviewing internet-based research protocols (Buchanan & Ess 2009). By 2015, 93% of IRBs surveyed acknowledged that are ethical issues unique to research using “online data”, yet only 55% said they felt their IRBs are well versed in the technical aspects of online data collection, and only 57% agreed that their IRB has the expertise to stay abreast of changes in online technology. IRBs are now further challenged with the growth of big data research (see §4.5 below), which increasingly relies on large datasets of personal information generated via social media, digital devices, or other means often hidden from users. A 2019 study of IRBs revealed only 25% felt prepared to evaluate protocols relying on big data, and only 6% had tools sufficient for considering this emerging area of internet research (Zimmer & Chapman 2020). Further, after being presented various hypothetical research scenarios utilizing big data and asked how their IRB would likely review such a protocol, numerous viewpoints different strongly in many cases. Consider the following scenario:

Researchers plan to scrape public comments from online newspaper pages to predict election outcomes. They will aggregate their analysis to determine public sentiment. The researchers don’t plan to inform commenters, and they plan to collect potentially-identifiable user names. Scraping comments violates the newspaper’s terms of service.

18% of respondents indicated their IRB would view this as exempt, 21% indicated expedited review, 33% suggested it would need full board review, while 28% did not think this was even human subjects research that would fall under their IRB’s purview (Zimmer & Chapman 2020). This points to potential gaps and inconsistencies in how IRBs review the ethical implications of big data research protocols.

3. History and Development of IRE as a Discipline

An extensive body of literature has developed since the 1990s around the use of the Internet for research (S. Jones 1999; Hunsinger, Klastrup, & Allen (eds.) 2010; Consalvo & Ess (eds.) 2011; Zimmer & Kinder-Kurlanda (eds.) 2017), with a growing emphasis on the ethical dimensions of Internet research.

A flurry of Internet research, and explicit concern for the ethical issues concurrently at play in it, began in the mid-1990s. In 1996, Storm King recognized the growing use of the Internet as a venue for research. His work explored the American Psychological Association’s guidelines for human subjects research with emergent forms of email, chat, listservs, and virtual communities. With careful attention to risk and benefit to Internet subjects, King offered a cautionary note:

When a field of study is new, the fine points of ethical considerations involved are undefined. As the field matures and results are compiled, researchers often review earlier studies and become concerned because of the apparent disregard for the human subjects involved (1996: 119).

The 1996 issue of Information Society dedicated to Internet research is considered a watershed moment, and included much seminal research still of impact and relevance today (Allen 1996; Boehlefeld 1996; Reid 1996).

Sherry Turkle’s 1997 Life on the Screen: Identity in the Age of the Internet called direct attention to the human element of online game environments. Moving squarely towards person-based versus text-based research, Turkle pushed researchers to consider human subjects implications of Internet research. Similarly, Markham’s Life Online: Researching Real Experience in Virtual Space (1998) highlighted the methodological complexities of online ethnographic studies, as did Jacobson’s 1999 methodological treatment of Internet research. The “field” of study changed the dynamics of researcher-researched roles, identity, and representation of participants from virtual spaces. Markham’s work in qualitative online research has been influential across disciplines, as research in nursing, psychology, and medicine has found the potential of this paradigm for online research (Flicker et al. 2004; Eysenbach & Till 2001; Seaboldt & Kupier 1997; Sharf 1997).

Then, in 1999, the American Association for the Advancement of Science (AAAS), with a contract from the U.S. Office for Protection from Research Risks (now known as the Office for Human Research Protections), convened a workshop, with the goal of assessing the alignment of traditional research ethics concepts to Internet research. The workshop acknowledged

The vast amount of social and behavioral information potentially available on the Internet has made it a prime target for researchers wishing to study the dynamics of human interactions and their consequences in this virtual medium. Researchers can potentially collect data from widely dispersed population sat relatively low cost and in less time than similar efforts in the physical world. As a result, there has been an increase in the number of Internet studies, ranging from surveys to naturalistic observation (Frankel & Siang 1999: 1).

In the medical/biomedical contexts, Internet research has grown rapidly. Also in 1999, Gunther Eysenbach wrote the first editorial to the newly formed Journal of Medical Internet Research. There were three driving forces behind the inception of this journal, and Eysenbach called attention to the growing social and interpersonal aspects of the Internet:

First, Internet protocols are used for clinical information and communication. In the future, Internet technology will be the platform for many telemedical applications. Second, the Internet revolutionizes the gathering, access and dissemination of non-clinical information in medicine: Bibliographic and factual databases are now world-wide accessible via graphical user interfaces, epidemiological and public health information can be gathered using the Internet, and increasingly the Internet is used for interactive medical education applications. Third, the Internet plays an important role for consumer health education, health promotion and teleprevention. (As an aside, it should be emphasized that “health education” on the Internet goes beyond the traditional model of health education, where a medical professional teaches the patient: On the Internet, much “health education” is done “consumer-to-consumer” by means of patient self support groups organizing in cyberspace. These patient-to-patient interchanges are becoming an important part of healthcare and are redefining the traditional model of preventive medicine and health promotion).

With scholarly attention growing and with the 1999 AAAS report (Frankel & Siang 1999) calling for action, other professional associations took notice and began drafting statements or guidelines, or addendum to their extant professional standards. For example, The Board of Scientific Affairs (BSA) of the American Psychological Association established an Advisory Group on Conducting Research on the Internet in 2001; the American Counseling Association’s 2005 revision to its Code of Ethics; the Association of Internet Researchers (AoIR) Ethics Working Group Guidelines, the National Committee for Research Ethics in the Social Sciences and the Humanities (NESH Norway), among others, have directed researchers and review boards to the ethics of Internet research, with attention to the most common areas of ethical concern (see OIR for links).

While many researchers focus on traditional research ethics principles, conceptualizations of Internet research ethics depend on disciplinary perspectives. Some disciplines, notably from the arts and humanities, posit that Internet research is more about context and representation than about “human subjects”, suggesting there is no intent, and thus minimal or no harm, to engage in research about actual persons. The debate has continued since the early 2000s. White (2002) argued against extant regulations that favored or privileged specific ideological, disciplinary and cultural prerogatives, which limit the freedoms and creativity of arts and humanities research. For example, she notes that the AAAS report “confuses physical individuals with constructed materials and human subjects with composite cultural works”, again calling attention to the person versus text divide that has permeated Internet research ethics debates. Another example of disciplinary differences comes from the Oral History Association, which acknowledged the growing use of the Internet as a site for research:

Simply put, oral history collects memories and personal commentaries of historical significance through recorded interviews. An oral history interview generally consists of a well-prepared interviewer questioning an interviewee and recording their exchange in audio or video format. Recordings of the interview are transcribed, summarized, or indexed and then placed in a library or archives. These interviews may be used for research or excerpted in a publication, radio or video documentary, museum exhibition, dramatization or other form of public presentation. Recordings, transcripts, catalogs, photographs and related documentary materials can also be posted on the Internet (Ritchie 2003: 19).

While the American Historical Association (A. Jones 2008) has argued that such research be “explicitly exempted” from ethical review board oversight, the use of the Internet could complicate such a stance if such data became available in public settings or available “downstream” with potential, unforeseeable risks to reputation, economic standing, or psychological harm, should identification occur.

Under the concept of text rather than human subjects, Internet research rests on arguments of publication and copyright; consider the venue of a blog, which does not meet the definition of human subject as in 45 C.F.R. § 46.102f (2009), as interpreted by most ethical review boards. A researcher need not obtain consent to use text from a blog, as it is generally considered publicly available, textual, published material. This argument of the “public park” analogy that has been generally accepted by researchers is appropriate for some Internet venues and tools, but not all: Context, intent, sensitivity of data, and expectations of Internet participants were identified in 2004 by Sveninngsson as crucial markers in Internet research ethics considerations.

By the mid-2000s, with three major anthologies published, and a growing literature base, there was ample scholarly literature documenting IRE across disciplines and methodologies, and subsequently, there was anecdotal data emerging from the review boards evaluating such research. In search of empirical data regarding the actual review board processes of Internet research from a human subjects perspective, Buchanan and Ess surveyed over 700 United States ethics review boards, and found that boards were primarily concerned with privacy, data security and confidentiality, and ensuring appropriate informed consent and recruitment procedures (Buchanan & Ess 2009; Buchanan & Hvizdak 2009).

In 2008, the Canadian Tri-Council’s Social Sciences and Humanities Research Ethics Special Working Committee: A Working Committee of the Interagency Advisory Panel on Research Ethics was convened (Blackstone et al. 2008); and in 2010, a meeting at the Secretary’s Advisory Committee to the Office for Human Research Protections highlighted Internet research (SACHRP 2010). Such prominent professional organizations as the Public Responsibility in Medicine and Research (PRIM&R) and the American Educational Research Association (AERA) have begun featuring Internet research ethics regularly at their conferences and related publications.

Recently, disciplines not traditionally involved in human subjects research have begun their own explorations of IRE. For example, researchers in computer security have actively examined the tenets of research ethics in CS and ICT (Aycock et al. 2012; Dittrich, Bailey, & Dietrich 2011; Carpenter & Dittrich 2012; Buchanan et al. 2011). Notably, the U.S. Federal Register requested comments on “The Menlo Report” in December 2011, which calls for a commitment by computer science researchers to the three principles of respect for persons, beneficence, and justice, while also adding a fourth principle on respect for law and public interest (Homeland Security 2011). SIGCHI, an international society for professionals, academics, and students interested in human-technology and human-computer interaction (HCI), has increasingly focused on how IRE applies to work in their domain (Frauenberger et al. 2017; Fiesler et al. 2018).

4. Key Ethical Issues in Internet Research

4.1 Privacy

Principles of research ethics dictate that researchers must ensure there are adequate provisions to protect the privacy of subjects and to maintain the confidentiality of any data collected. A violation of privacy or breach of confidentiality presents a risk of serious harm to participants, ranging from the exposure of personal or sensitive information, the divulgence of embarrassing or illegal conduct, or the release of data otherwise protected under law.

Research ethics concerns around individual privacy is often expressed in terms of the level of linkability of data to individuals, and the potential harms from disclosure of information As Internet research has grown in complexity and computational sophistication, ethics concerns have focused on current and future uses of data, and the potential downstream harms that could occur. Protecting research participants’ privacy and confidentiality is typically achieved through a combination of research tactics and practices, including engaging in data collection under controlled or anonymous environments, the scrubbing of data to remove personally identifiable information (PII), or the use of access restrictions and related data security methods. And, the specificity and characteristics of the data will often dictate if there are regulatory considerations, in addition to the methodological considerations around privacy and confidentiality. For example, personally identifiable information (PII) typically demands the most stringent protections. The National Institutes of Health (NIH), for example, defines PII as:

any information about an individual maintained by an agency, including, but not limited to, education, financial transactions, medical history, and criminal or employment history and information which can be used to distinguish or trace an individual’s identity, such as their name, SSN, date and place of birth, mother’s maiden name, biometric records, etc., including any other personal information that is linked or linkable to an individual (NIH 2010).

Typically, examples of identifying pieces of information have included personal characteristics (such as date of birth, place of birth, mother’s maiden name, gender, sexual orientation, and other distinguishing features and biometrics information, such as height, weight, physical appearance, fingerprints, DNA and retinal scans), unique numbers or identifiers assigned to an individual (such as a name, address, phone number, social security number, driver’s license number, financial account numbers), and descriptions of physical location (GIS/GPS log data, electronic bracelet monitoring information).

The 2018 EU General Data Protection Regulation lays out the legal and regulatory requirements for data use across the EU. Mondschein & Monda (2018) provides a thorough discussion on the different types of data that are considered in the GDPR: Personal data, such as names, identification numbers, location data, and so on; Special categories of personal data, such as race or ethic origin, political opinions, or religious beliefs; Pseudonymous data, referring to data that has been altered so the subject cannot be directly identified without having further information; Anonymous data, information which does not relate to an identifiable natural person or to personal data rendered anonymous in such a manner that the data subject is not or no longer identifiable. They also advise researchers to consider

data protection issues at an early stage of a research project is of great importance specifically in the context of large-scale research endeavours that make use of personal data (2018: 56).

Internet research introduces new complications to these longstanding definitions and regulatory frameworks intended to protect subject privacy. For example, researchers increasingly are able to collect detailed data about individuals from sources such as Facebook, Twitter, blogs or public email archives, and these rich data sets can more easily be processed, compared, and combined with other data (and datasets) available online. In numerous cases, both researchers and members of the general public have been able to re-identify individuals by analyzing and comparing such datasets, using data-fields as benign as one’s zip code (Sweeny 2002), random Web search queries (Barbaro & Zeller 2006), or movie ratings (Narayanan & Shmatikov 2008) as the vital key for reidentification of a presumed anonymous user. Prior to widespread Internet-based data collection and processing, few would have considered one’s movie ratings or zipcode as personally-identifiable. Yet, these cases reveal that merely stripping traditional “identifiable” information such as a subject’s name, address, or social security number is no longer sufficient to ensure data remains anonymous (Ohm 2010), and requires the reconsideration of what is considered “personally identifiable information” (Schwartz & Solove 2011). This points to the critical distinction between data that is kept confidential versus data that is truly anonymous. Increasingly, data are rarely completely anonymous, as researchers have routinely demonstrated they can often reidentify individuals hidden in “anonymized” datasets with ease (Ohm 2010). This reality places new pressure on ensuring datasets are kept, at the least, suitably confidential through both physical and computational security measures. These measures may also include requirements to store data in “clean rooms”, or in non-networked environments in an effort to control data transmission.

Similarly, new types of data often collected in Internet research might also be used to identify a subject within a previously-assumed anonymous dataset. For example, Internet researchers might collect Internet Protocol (IP) addresses when conducting online surveys or analyzing transaction logs. An IP address is a unique identifier that is assigned to every device connected to the Internet; in most cases, individual computers are assigned a unique IP address, while in some cases the address is assigned to a larger node or Internet gateway for a collection of computers. Nearly all websites and Internet service providers store activity logs that link activity with IP addresses, in many cases, eventually to specific computers or users. Current U.S. law does not hold IP addresses to be personally identifiable information, while other countries and regulatory bodies do. For example, the European Data Privacy Act at Article 29, holds that IP addresses do constitute PII. Buchanan et al. (2011), note, however, that under the U.S. Civil Rights Act, for the purposes of the HIPAA Act,[2] IP addresses are considered a form of PII (45 C.F.R. § 164.514 2002).[3] There could potentially be a reconsideration by other federal regulatory agencies over IP addresses as PII, and researchers and boards will need to be attentive should such change occur.

A similar complication emerges when we consider the meaning of “private information” within the context of Internet-based research. U.S. federal regulations define “private information” as:

[A]ny information about behavior that occurs in a context in which an individual can reasonably expect that no observation or recording is taking place, and information that has been provided for specific purposes by an individual and that the individual can reasonably expect will not be made public (for example, a medical record) (45 C.F.R. § 46.102(f) 2009).

This standard definition of “private information” has two key components. First, private information is that which subjects reasonably expect is not normally monitored or collected. Second, private information is that which subjects reasonably expect is not typically publicly available. Conversely, the definition also suggests the opposite is true: if users cannot reasonably expect data isn’t being observed or recorded, or they cannot expect data isn’t publicly available, then the data does not rise to the level of “private information” requiring particular privacy protections. Researchers and REBs have routinely worked with this definition of “private information” to ensure the protection of individuals’ privacy.

These distinctions take on greater weight, however, when considering the data environments and collection practices common with Internet-based research. Researchers interested in collecting or analyzing online actions of subjects—perhaps through the mining of online server logs, the use of tracking cookies, or the scraping of social media profiles and feeds—could argue that subjects do not have a reasonable expectation that such online activities are not routinely monitored since nearly all online transactions and interactions are routinely logged by websites and service providers. Thus, online data trails might not rise to the level of “private information”. However, numerous studies have indicated that average Internet users have incomplete understandings of how their activities are routinely tracked, and the related privacy practices and policies of the sites they visit (Hoofnagle & King 2008 [OIR]; Milne & Culnan 2004; Tsai et al. 2006). Hudson and Bruckman (2005) conducted empirical research on users’ expectations and understandings of privacy, finding that participants’ expectations of privacy within public chatrooms conflicted with what was actually a very public online space. Rosenberg (2010) examined the public/private distinction in the realm of virtual worlds, suggesting researchers must determine what kind of social norms and relations predominate an online space before making assumptions about the “publicness” of information shared within. Thus, it remains unclear whether Internet users truly understand if and when their online activity is regularly monitored and tracked, and what kind of reasonable expectations truly exist. This ambiguity creates new challenges for researchers and REBs when trying to apply the definition of “private information” to ensure subject privacy is properly addressed (Zimmer 2010).

This complexity in addressing subject privacy in Internet research is further compounded with the rise of social networking as a place for the sharing of information, and a site for research. Users increasingly share more and more personal information on platforms like Facebook or Twitter. For researchers, social media platforms provide a rich resource for study, and much of the content is available to be viewed and downloaded with minimal effort. Since much of the information posted to social media sites is publicly viewable, it thus fails to meet the standard regulatory definition of “private information”. Therefore, researchers attempting to collect and analyze social media postings might not treat the data as requiring any particular privacy considerations. Yet, social media platforms represent a complex environment of social interaction where users are often required to place friends, lovers, colleagues, and minor acquaintances within the same singular category of “friends”, where privacy policies and terms of service are not fully understood (Madejski et al. 2011), and where the technical infrastructures fail to truly support privacy projections (Bonneau & Preibush 2010) and regularly change with little notice (Stone 2009 [OIR]; Zimmer 2009 [OIR]). As a result, it is difficult to understand with any certainty what a user’s intention was when posting an item onto a social media platform (Acquisti & Gross 2006). The user may have intended the post for a private group but failed to completely understand how to adjust the privacy settings accordingly. Or, the information might have previously been restricted to only certain friends, but a change in the technical platform suddenly made the data more visible to all.

Ohm (2010) warns that

the utility and privacy of data are linked, and so long as data is useful, even in the slightest, then it is also potentially reidentifiable (2010: 1751).

With the rapid growth of Internet-based research, Ohm’s concern becomes even more dire. The traditional definitions and approaches to understanding the nature of privacy, anonymity, and precisely what kind of information deserves protection becomes strained, forcing researchers and REBs to consider more nuanced theories of privacy (Nissenbaum 2009) and approaches to respecting and projecting subject privacy (Markham 2012; Zimmer 2010).

4.2 Recruitment

Depending on the type of Internet research being carried out, recruitment of participants may be done in a number of ways. As with any form of research, the study population or participants are selected for specific purposes (i.e., an ethnographic study of a particular group on online game players), or, can be selected from a range of sampling techniques (i.e., a convenience sample gleaned from the users of Amazon’s Mechanical Turk crowdsourcing platform[4]). In the U.S. context, a recruitment plan is considered part of the informed consent process, and as such, any recruitment script or posting must be reviewed and approved by an REB prior to posting or beginning solicitation (if the project is human subjects research). Further, the selection of participants must be impartial and unbiased, and any risks and benefits must be justly distributed. This concept is challenging to apply in Internet contexts, in which populations are often self-selected and can be exclusive, depending on membership and access status, as well as the common disparities of online access based on economic and social variables. Researchers also face recruitment challenges due to online subjects’ potential anonymity, especially as it relates to the frequent use of pseudonyms online, having multiple or alternative identities online, and the general challenges of verifying a subject’s age and demographic information. Moreover, basic ethical principles for approaching and recruiting participants involve protecting their privacy and confidentiality. Internet research can both maximize these protections, as an individual may never be known beyond a screen name or avatar existence; or, conversely, the use of IP addresses, placement of cookies, availability and access to more information than necessary for the research purposes, may minimize the protections of privacy and confidentiality.

Much recruitment is taking place via social media; examples include push technologies, a synchronous approach in which a text or tweet is sent from a researcher to potential participants based on profile data, platform activity, or geolocation. Other methods of pull technologies recruitment include direct email, dedicated web pages, YouTube videos, direct solicitation via “stickies” posted on fora or web sites directing participants to a study site, or data aggregation or scraping data for potential recruitment. Regardless of the means used, researchers must follow the terms of the site—from the specific norms and nuances governing a site or locale to the legal issues in terms of service agreements. For example, early pro-anorexia web sites (see Overbeke 2008) were often treated as sensitive spaces deserving spcicial consideration, and researchers were asked to respect the privacy of the participants and not engage in research (Walstrom 2004). In the gaming context, Reynolds and de Zwart (2010) ask:

Has the researcher disclosed the fact that he or she is engaged in research and is observing/interacting with other players for the purposes of gathering research data? How does the research project impact upon the community and general game play? Is the research project permitted under the Terms of Service?

Colvin and Lanigan (2005: 38) suggest researchers

Seek permission from Web site owners and group moderators before posting recruitment announcements, Then, preface the recruitment announcement with a statement that delineates the permission that has been granted, including the contact person and date received. Identify a concluding date (deadline) for the research study and make every effort to remove recruitment postings, which often become embedded within Web site postings.

Barratt and Lenton, among others, agree:

It is critical, therefore, to form partnerships with online community moderators by not only asking their permission to post the request, but eliciting their feedback and support as well (2010: 71).

Mendelson (2007) and Smith and Leigh (1997) note that recruitment notices need to contain more information than the typical flyers or advertisements used for newspaper advertisements. Mentioning the approval of moderators is important for establishing authenticity, and so is providing detailed information about the study and how to contact both the researchers and the appropriate research ethics board.

Given the array of techniques possible for recruitment, the concept of “research spam” requires attention. The Council of American Survey Research warns

Research Organizations should take steps to limit the number of survey invitations sent to targeted respondents by email solicitations or other methods over the Internet so as to avoid harassment and response bias caused by the repeated recruitment and participation by a given pool (or panel) of data subjects (CASRO 2011: I.B.3).

Ultimately, researchers using Internet recruitment measures must ensure that potential participants are getting enough information in both the recruitment materials and any subsequent consent documents. Researchers must ensure that recruitment methods do not lead to an individual being identified without their permission, and if such identification is possible, are there significant risks involved?

4.3 Informed Consent

As the cornerstone of human subjects protections, informed consent means that participants are voluntarily participating in the research with adequate knowledge of relevant risks and benefits. Providing informed consent typically includes the researcher explaining the purpose of the research, the methods being used, the possible outcomes of the research, as well as associated risks or harms that the participants might face. The process involves providing the recipient clear and understandable explanations of these issues in a concise way, providing sufficient opportunity to consider them and enquire about any aspect of the research prior to granting consent, and ensuring the subject has not been coerced into participating. Gaining consent in traditional research is typically done verbally or in writing, either in a face-to-face meeting where the researcher reviews the document, through telephone scripts, through mailed documents, fax, or video, and can be obtained with the assistance of an advocate in the case of vulnerable populations. Most importantly, informed consent was built on the ideal of “process” and the verification of understanding, and thus, requires an ongoing communicative relationship between and among researchers and their participants. The emergence of the Internet as both a tool and a venue for research has introduced challenges to this traditional approach to informed consent.

In most regulatory frameworks, there are instances when informed consent might be waived, or the standard processes of obtaining informed consent might be modified, if approved by a research ethics board.[5] Various forms of Internet research require different approaches to the consent process. Some standards have emerged, depending on venue (i.e., an online survey platform versus a private Facebook group). However, researchers are encouraged to consider waiver of consent and/or documentation, if appropriate, by using the flexibilities of their extant regulations.

Where consent is required but documentation has been waived by an ethical review board, a “portal” can be used to provide consent information. For example, a researcher may send an email to the participant with a link a separate portal or site information page where information on the project is contained. The participant can read the documentation and click on an “I agree” submission. Rosser et al. (2010) recommend using a “chunked” consent document, whereby individuals can read specific sections, agree, and then continue onwards to completion of the consent form, until reaching the study site.

In addition to portals, researchers will often make use of consent cards or tokens; this alleviates concerns that unannounced researcher presence is unacceptable, or, that a researcher’s presence is intrusive to the natural flow and movement of a given locale. Hudson and Bruckman (2004, 2005) highlighted the unique challenges in gaining consent in chat rooms, while Lawson (2004) offers an array of consent possibilities for synchronous computer-mediated communication. There are different practical challenges in the consent process in Internet research, given the fluidity and temporal nature of Internet spaces.

If documentation of consent is required, some researchers have utilized alternatives such as electronic signatures, which can range from a simple electronic check box to acknowledge acceptance of the terms to more robust means of validation using encrypted digital signatures, although the validity of electronic signatures vary by jurisdiction.

Regardless of venue, informed consent documents are undergoing changes in the information provided to research participants. While the basic elements of consent remain intact, researchers must now acknowledge with less certainty specific aspects of their data longevity, risks to privacy, confidentiality and anonymity (see §4.1 Privacy, above), and access to or ownership of data. Research participants must understand that their terms of service or end user license agreement consent is distinct from their consent to participate in research. And, researchers must address and inform participants/subjects about potential risk of data intrusion or misappropriation of data if subsequently made public or available outside of the confines of the original research. Statements should be revised to reflect such realities as cloud storage (see §4.4 below) and data sharing.

For example, Aycock et al. (2012: 141) describe a continuum of security and access statements used in informed consent documents:

  • “No others will have access to the data”
  • “Anonymous identifiers will be used during all data collection and analysis and the link to the subject identifiers will be stored in a secure manner”
  • “Data files that contain summaries of chart reviews and surveys will only have study numbers but no data to identify the subject. The key [linking] subject names and these study identifiers will be kept in a locked file”
  • “Electronic data will be stored on a password protected and secure computer that will be kept in a locked office. The software ‘File Vault’ will be used to protect all study data loaded to portable laptops, flash drives or other storage media. This will encode all data… using Advanced Encryption Standard with 128-bit keys (AES-128)”

This use of encryption in the last statement may be necessary in research including sensitive data, such as medical, sexual, health, financial, and so on. Barratt and Lenton (2010), in their research on illicit drug use and online forum behaviors, also provide guidance about use of secure transmission and encryption as part of the consent process.

In addition to informing participants about potential risks and employing technological protections, NIH-funded researchers whose work includes projects with identifiable, sensitive information will automatically be issued a Certificate of Confidentiality:

CoCs protect the privacy of research subjects by prohibiting disclosure of identifiable, sensitive research information to anyone not connected to the research except when the subject consents or in a few other specific situations (NIH 2021 [OIR]).

However, these do not protect against release of data outside of the U.S. Given the reality of Internet research itself, which inherently spans borders, new models may be in order to ensure confidentiality of data and protections of data. Models of informed consent for traditional international research are fundamentally challenging due to cultural specificity and norms (Annas 2009; Boga et al. 2011; Krogstad et al. 2010); with Internet research, where researchers may be unaware of the specific location of an individual, consent takes on significantly higher demands. While current standards of practice show that consent models stem from the jurisdiction of the researcher and sponsoring research institution, complications arise in the face of age verification, age of majority/consent, reporting of adverse effects or complaints with the research process, and authentication of identity. Various jurisdictional laws around privacy are relevant for the consent process; a useful tool is Forrester’s Data Privacy Heat Map, which relies on in-depth analyses of the data privacy-related laws and cultures of countries around the world, helping researchers design appropriate approaches to privacy and data protection given the particular context (see OIR).

In addition, as more federal agencies and funding bodies across the globe encourage making research data publicly-available (i.e., NSF, NIH, Wellcome Trust, Research Councils U.K.), the language used in consent documents will change accordingly to represent this intended longevity of data and opportunities for future, unanticipated use. Given the ease with which Internet data can flow between and among Internet venues, changes in the overall accessibility of data might occur (early “private” newsgroup conversations were made “publicly searchable” when Google bought DejaNews), and reuse and access by others is increasingly possible with shared datasets. Current data sharing mandates must be considered in the consent process. Alignment between a data sharing policy and an informed consent document is imperative. Both should include provisions for appropriate protection of privacy, confidentiality, security, and intellectual property.

There is general agreement in the U.S. that individual consent is not necessary for researchers to use publicly available data, such as public Twitter feeds. Recommendations were made by The National Human Subjects Protection Advisory Committee (NHRPAC) in 2002 regarding publicly available data sets (see OIR). Data use or data restriction agreements are commonly used and set the parameters of use for researchers.

The U.K. Data Archive (2011 [OIR]) provides guidance on consent and data sharing:

When research involves obtaining data from people, researchers are expected to maintain high ethical standards such as those recommended by professional bodies, institutions and funding organisations, both during research and when sharing data. Research data — even sensitive and confidential data — can be shared ethically and legally if researchers pay attention, from the beginning of research, to three important aspects:
• when gaining informed consent, include provision for data sharing
• where needed, protect people’s identities by anonymising data
• consider controlling access to data These measures should be considered jointly. The same measures form part of good research practice and data management, even if data sharing is not envisioned. Data collected from and about people may hold personal, sensitive or confidential information. This does not mean that all data obtained by research with participants are personal or confidential.

Data sharing made public headlines in 2016 when a Danish researcher released a data set comprised of scraped data from nearly 70,000 users of the OkCupid online dating site. The data set was highly reidentifiable and included potentially sensitive information, including usernames, age, gender, geographic location, what kind of relationship (or sex) they’re interested in, personality traits, and answers to thousands of profiling questions used by the site. The researcher claimed the data were public and thus, such sharing and use was unproblematic. Zimmer (2016) was among many privacy and ethics scholars who critiqued this stance.

The Danish researchers did not seek any form of consent or debriefing on the collection and use of the data, nor did they have any ethics oversight. Many researchers and ethics boards are, however, attempting to mitigate some of these ethical concerns by including blanket statements in their consent processes, indicating such precautions for research participants. For example,

I understand that online communications may be at greater risk for hacking, intrusions, and other violations. Despite these possibilities, I consent to participate.

A more specific example comes from the Canadian context when researchers propose to use specific online survey tools hosted in the United States; REBs commonly recommend the following type language for use in informed consent documents:

Please note that the online survey is hosted by Company ABC which is a web survey company located in the U.S.A. All responses to the survey will be stored and accessed in the U.S.A. This company is subject to U.S. Laws, in particular, to the U.S. Patriot Act/Domestic Security Enhancement Act that allows authorities access to the records that your responses to the questions will be stored and accessed in the U.S.A. The security and private policy for Company ABC can be viewed at http://…/.[6]

Researchers are also encouraged to review the Terms of Use and Terms of Service of the application that are being used, demonstrating its details to the REB in the application and informing participants of such details in the informed consent form or script. Researchers are also encouraged to consider broader contextual factors of the data source and research goals when weighing the possible violation of a platform’s Terms of Service (Fiesler, Beard, & Keegan 2020).

4.3.1 Minors and Consent

Internet research poses particular challenges to age verification, assent and consent procedures, and appropriate methodological approaches with minors. Age of consent varies across countries, states, communities, and locales of all sorts. For research conducted or supported by U.S. federal agencies bound by the Common Rule, children are

persons who have not attained the legal age for consent [18, in the U.S.] to treatments or procedures involved in the research, under the applicable law of the jurisdiction in which the research will be conducted (45 C.F.R. § 46.402(a) 2009).

Goldfarb (2008) provides an exhaustive discussion of age of majority across the U.S. states, with a special focus on clinical research, noting children must be seven or older to assent to participation (see 45 C.F.R. § 46 Subpart D 2009).

Spriggs (2010), from the Australian context, notes that while no formal guidance exists on Internet research and minors under the National Statement, she advises:

  • Parental consent may be needed when information is potentially identifiable. Identifiable information makes risks to individuals higher and may mean that the safety net of parental consent is preferable.
  • There is also a need to consider whether seeking parental consent would make things worse e.g., by putting a young person from a dysfunctional home at risk or result in disclosure to the researcher of additional identifying information about the identity and location of the young person. Parental consent may be “contrary to the best interests” of the child or young person when it offers no protection or makes matters worse (2010: 30).

To assist with the consent process, age verification measures can be used. These can range from more technical software applications to less formal knowledge checks embedded in an information sheet or consent document. Multiple confirmation points (asking for age, later asking for year of birth, etc.) are practical measures for researchers. Depending on the types of data, sensitivity of data, use of data, researchers and boards will carefully construct the appropriate options for consent, including waiver of consent, waiver of documentation, and/or waiver of parental consent.

4.4 Cloud Computing and Research Ethics

Recent developments in cloud computing platforms have led to unique opportunities—and ethical challenges—for researchers. Cloud computing describes the deployment of computing resources via the Internet, providing on-demand, flexible, and scalable computing from remote locations. Examples include web-based email and calendaring services provided by Google or Yahoo, online productivity platforms like Google Docs or Microsoft Office 365, online file storage and sharing platforms like Dropbox or, and large-scale application development and data processing platforms such as Google Apps, Facebook Developers Platform, and Amazon Web Services.

Alongside businesses and consumers, researchers have begun utilizing cloud computing platforms and services to assist in various tasks, including subject recruitment, data collection and storage, large-scale data processing, as well as communication and collaboration (Allan 2011 [OIR]; X. Chen et al. 2010 [OIR]); Simmhan et al. 2008; Simmhan et al. 2009).

As reliance on cloud computing increases among researchers, so do the ethical implications. Among the greatest concerns is ensuring data privacy and security with cloud-based services. For researchers sharing datasets online for collaborative processing and analysis, steps must be taken to ensure only authorized personnel have access to the online data that might contain PII, but also that suitable encryption is used for data transfer and storage, and that the cloud service provider maintains sufficient security to prevent breaches. Further, once research data is uploaded to a third-party cloud provider, attention must be paid to the terms of service for the contracted provider to determine what level of access to the data, if any, might be allowed to advertisers, law enforcement, or other external agents.

Alongside the privacy and security concerns, researchers also have an ethical duty of data stewardship which is further complicated when research data is placed in the cloud for storage or processing. Cloud providers might utilize data centers spread across the globe, meaning research data might be located outside the United States, and its legal jurisdictions. Terms of service might grant cloud providers a license to access and use research data for purposes not initially intended or approved of by the subjects involved. Stewardship may require the prompt and complete destruction of research data, a measure complicated if a cloud provider has distributed and backed-up the data across multiple locations.

A more unique application of cloud computing for research involves the crowdsourcing of data analysis and processing functions, that is, leveraging the thousands of users of various online products and services to complete research related tasks remotely. Examples include using a distributed network of video game players to assist in solving protein folding problems (Markoff 2010), and leveraging Amazon’s Mechanical Turk crowdsourcing marketplace platform to assist with large scale data processing and coding functions that cannot be automated (Conley & Tosti-Kharas 2014; J. Chen et al. 2011). Using cloud-based platforms can raise various critical ethical and methodological issues.

First, new concerns over data privacy and security emerge when research tasks are widely distributed across a global network of users. Researchers must take great care in ensuring research data containing personal or sensitive information isn’t accessible by outsourced labor, or that none of the users providing crowdsourced labor are able to aggregate and store their own copy of the research dataset. Second, crowdsourcing presents ethical concerns over trust and validity of the research process itself. Rather than a local team of research assistants usually under a principal investigator’s supervision and control, crowdsourcing tends to be distributed beyond the direct management or control of the researcher, providing less opportunity to ensure sufficient training for the required tasks. Thus, researchers will need to create additional means of verifying data results to confirm tasks are completed properly and correctly.

Two additional ethical concerns with crowdsourcing involve labor management and authorship. Mechanical Turk users were not originally intended to be research subjects, first and foremost. However, researchers using Mechanical Turks must ensure that the laborers on the other end of the cloud-based relationship are not being exploited, that they are legally eligible to be working for hire, and that the incentives provided are real, meaningful, and appropriate (Scholz 2008; Williams 2010 [OIR).

Finally, at the end of a successful research project utilizing crowdsourcing, a researcher may be confronted with the ethical challenge of how to properly acknowledge the contributions made by (typically anonymous) laborers. Ethical research requires the fair and accurate description of authorship. Disciplines vary as to how to report relative contributions made by collaborators and research assistants, and this dilemma increases when crowdsourcing is used to assist with the research project.

4.5 Big Data Considerations

Algorithmic processing is a corollary of big data research, and newfound ethical considerations have emerged. From “algorithmic harms” to “predictive analytics”, the power of today’s algorithms exceeds long-standing privacy beliefs and norms. Specifically, the National Science and Technology Council note:

“Analytical algorithms” as algorithms for prioritizing, classifying, filtering, and predicting. Their use can create privacy issues when the information used by algorithms is inappropriate or inaccurate, when incorrect decisions occur, when there is no reasonable means of redress, when an individual’s autonomy is directly related to algorithmic scoring, or when the use of predictive algorithms chills desirable behavior or encourages other privacy harms. (NSTC 2016: 18).

While the concept of big data is not new, and the term has been in technical discourses since the 1990s, public awareness and response to big data research is much more recent. Following the rise of social media-based research, Buchanan (2016) has delineated the emergence of “big data”-based research from 2012 to the present, with no signs of an endpoint.

Big data research is challenging for research ethics boards, often presenting what the computer ethicist James Moor would call “conceptual muddles”: the inability to properly conceptualize the ethical values and dilemmas at play in a new technological context. Subject privacy, for example, is typically protected within the context of research ethics through a combination of various tactics and practices, including engaging in data collection under controlled or anonymous environments, limiting the personal information gathered, scrubbing data to remove or obscure personally identifiable information, and using access restrictions and related data security methods to prevent unauthorized access and use of the research data itself. The nature and understanding of privacy become muddled, however, in the context of big data research, and as a result, ensuring it is respected and protected in this new domain becomes challenging.

For example, the determination of what constitutes “private information”—and thus triggering particular privacy concerns—becomes difficult within the context of big data research. Distinctions within the regulatory definition of “private information”—namely, that it only applies to information which subjects reasonably expect is not normally monitored or collected and not normally publicly available—become less clearly applicable when considering the data environments and collection practices that typify big data research, such as the wholesale scraping of Facebook news feed content or public OKCupid accounts.

When considered through the lens of the regulatory definition of “private information”, social media postings are often considered public, especially when users take no visible, affirmative steps to restrict access. As a result, big data researchers might conclude subjects are not deserving of particular privacy consideration. Yet, the social media platforms frequently used for big data research purposes represent a complex environment of socio-technical interactions, where users often fail to understand fully how their social activities might be regularly monitored, harvested, and shared with third parties, where privacy policies and terms of service are not fully understood and change frequently, and where the technical infrastructures and interfaces are designed to make restricting information flows and protecting one’s privacy difficult.

As noted in §4.1 above it becomes difficult to confirm a user’s intention when sharing information on a social media platform, and whether users recognize that providing information in a social environment also opens it up for widespread harvesting and use by researchers. This uncertainty in the intent and expectations of users of social media and internet-based platforms—often fueled by the design of the platforms themselves—create numerous conceptual muddles in our ability to properly alleviate potential privacy concerns in big data research.

The conceptual gaps that exist regarding privacy and the definition of personally identifiable information in the context of big data research inevitably lead to similar gaps regarding when informed consent is necessary. Researchers mining Facebook profile information or public Twitter streams, for example, typically argue that no specific consent is necessary due to the fact the information was publicly available. It remains unknown whether users truly understood the technical conditions under which they made information visible on these social media platforms or if they foresaw their data being harvested for research purposes, rather than just appearing onscreen for fleeting glimpses by their friends and followers (Fiesler & Proferes, 2018). In the case of the Facebook emotional contagion experiment (Kramer, Guillory, & Hancock 2014), the lack of obtaining consent was initially rationalized through the notion that the research appeared to have been carried out under Facebook’s extensive terms of service, whose data use policy, while more than 9,000 words long, does make passing mention to “research”. It was later revealed, however, that the data use policy in effect when the experiment was conducted never mentioned “research” at all (Hill 2014).

Additional ethical concerns have arisen surrounding the large scale data collection practices connected to machine learning and the development of artificial intelligence. For example, negative public attention have surrounded algorithms designed to infer sexual orientation from photographs and facial recognition algorithms trained on videos of transgender people. In both cases, ethical concerns have been raised about both the purpose of these algorithms and the fact that the data that trained them (dating profile photos and YouTube videos, respectively) was “public” but collected from potentially vulnerable populations without consent (Metcalf 2017; Keyes 2019). While those building AI systems cannot always control the conditions under which the data they utilize is collected, their increased use of big datasets captured from social media or related sources raises a number of concerns beyond what typically is considered part of the growing focus on AI ethics: fairness, accountability and transparency in AI can only be fully possible when data collection is achieved in a fair, ethical, and just manner (Stahl & Wright 2018; Kerry 2020).

4.6 Internet Research and Industry Ethics

The Facebook emotional contagion experiment, discussed above, is just one example in a larger trend of big data research conducted outside of traditional university-based research ethics oversight mechanisms. Nearly all online companies and platforms analyze data and test theories that often rely on data from individual users. Industry-based data research, once limited to marketing-oriented “A/B testing” of benign changes in interface designs or corporate communication messages, now encompasses information about how users behave online, what they click and read, how they move, eat, and sleep, the content they consume online, and even how they move about their homes. Such research produces inferences about individuals’ tastes and preferences, social relations, communications, movements, and work habits. It implies pervasive testing of products and services that are an integral part of intimate daily life, ranging from connected home products to social networks to smart cars. Except in cases where they are partnering with academic institutions, companies typically do not put internal research activities through a formal ethical review process, since results are typically never shared publicly and the perceived impact on users is minimal.

The growth of industry-based big data research, however, presents new risks to individuals’ privacy, on the one hand, and to organizations’ legal compliance, reputation, and brand, on the other hand. When organizations process personal data outside of their original context, individuals may in some cases greatly benefit, but in other cases may be surprised, outraged, or even harmed. Soliciting consent from affected individuals can be impractical: Organizations might collect data indirectly or based on identifiers that do not directly match individuals’ contact details. Moreover, by definition, some non-contextual uses—including the retention of data for longer than envisaged for purposes of a newly emergent use—may be unforeseen at the time of collection. As Crawford and Schultz (2014) note,

how does one give notice and get consent for innumerable and perhaps even yet-to-be-determined queries that one might run that create “personal data”? (2014: 108)

With corporations developing vast “living laboratories” for big data research, research ethics has become a critical component of the design and oversight of these activities. For example, in response to the controversy surrounding the emotional contagion experiment, Facebook developed an internal ethical review process that, according to its facilitators,

leverages the company’s organizational structure, creating multiple training opportunities and research review checkpoints in the existing organizational flow (Jackman & Kanerva 2016: 444).

While such efforts are important and laudable, they remain open for improvement. Hoffmann (2016), for example, has criticized Facebook for launching

an ethics review process that innovates on process but tells us little about the ethical values informing their product development.

Further, in their study of employees doing the work of ethics inside of numerous Silicon Valley companies, Metcalf and colleagues found considerable tension between trying to resolve thorny ethical dilemmas that emerge within an organization’s data practices and the broader business model and corporate logic that dominates internal decision-making (Metcalf, Moss, & boyd 2019).

5. Research Ethics Boards Guidelines

While many researchers and review boards across the world work without formal guidance, many research ethics boards have developed guidelines for Internet research. While many such guidelines exist, the following provides examples for researchers preparing for an REB review, or for boards developing their own policies.

Additional resources are found in Other Internet Resources below.


  • Acquisti, Alessandro and Ralph Gross, 2006, “Imagined Communities: Awareness, Information Sharing, and Privacy on the Facebook”, in Privacy Enhancing Technologies: PET 2006, George Danezis and Philippe Golle (eds.), (Lecture Notes in Computer Science 4258), Berlin, Heidelberg: Springer Berlin Heidelberg, pp. 36–58. doi:10.1007/11957454_3
  • Allen, Christina, 1996, “What’s Wrong with the ‘Golden Rule’? Conundrums of Conducting Ethical Research in Cyberspace”, The Information Society, 12(2): 175–188. doi:10.1080/713856146
  • Annas, George J., 2009, “Globalized Clinical Trials and Informed Consent”, New England Journal of Medicine, 360(20): 2050–2053. doi:10.1056/NEJMp0901474
  • Aycock, John, Elizabeth Buchanan, Scott Dexter, and David Dittrich, 2012, “Human Subjects, Agents, or Bots: Current Issues in Ethics and Computer Security Research”, in Financial Cryptography and Data Security, George Danezis, Sven Dietrich, and Kazue Sako (eds.), (Lecture Notes in Computer Science 7126), Berlin, Heidelberg: Springer Berlin Heidelberg, pp. 138–145. doi:10.1007/978-3-642-29889-9_12
  • Banks, Will and Michelle Eble, 2007, “Digital Spaces, Online Environments, and Human Participant Research: Interfacing with Institutional Review Boards”, in Digital Writing Research: Technologies, Methodologies, and Ethical Issues, Heidi A. McKee and Dànielle Nicole DeVoss (eds.), Cresskill, NJ: Hampton Press, pp. 27–47.
  • Barbaro, Michael and Tom Zeller Jr., 2006, “A Face Is Exposed for AOL Searcher No. 4417749”, The New York Times, 9 August 2006, pp. A1.
  • Barratt, Monica Jane and Simon Lenton, 2010, “Beyond Recruitment? Participatory Online Research with People Who Use Drugs”, International Journal of Internet Research Ethics, 3(1): 69–86. [Barratt and Lenton 2010 available online]
  • BBC, 2011, “US Scientists ‘Knew Guatemala Syphilis Tests Unethical’”, BBC News, 30 August 2011, sec. Latin America & Caribbean. [BBC 2011 available online]
  • Beauchamp, Tom L. and James F. Childress, 2008, Principles of Biomedical Ethics, Oxford: Oxford University Press.
  • Blackstone, Mary, Lisa Given, Joseph Levy, Michelle McGinn, Patrick O’Neill, Ted Palys, and Will van den Hoonaard, 2008, Extending the Spectrum: The TCPS and Ethical Issues Involving Internet-Based Research, Interagency Advisory Panel and Secretariat on Research Ethics, Ottawa, Canada. [Blackstone et al. 2008 available online]
  • Boehlefeld, Sharon Polancic, 1996, “Doing the Right Thing: Ethical Cyberspace Research”, The Information Society, 12(2): 141–152. doi:10.1080/713856136
  • Boga, Mwanamvua, Alun Davies, Dorcas Kamuya, Samson M. Kinyanjui, Ester Kivaya, Francis Kombe, Trudie Lang, Vicki Marsh, Bibi Mbete, Albert Mlamba, et al., 2011, “Strengthening the Informed Consent Process in International Health Research through Community Engagement: The KEMRI-Wellcome Trust Research Programme Experience”, PLoS Medicine, 8(9): e1001089. doi:10.1371/journal.pmed.1001089
  • Bonneau, Joseph and Sören Preibusch, 2010, “The Privacy Jungle: On the Market for Data Protection in Social Networks”, in Economics of Information Security and Privacy, Tyler Moore, David Pym, and Christos Ioannidis (eds.), Boston: Springer US, pp. 121–167. doi:10.1007/978-1-4419-6967-5_8
  • Booth, Robert, 2014, “Facebook reveals news feed experiment to control emotions”, The Guardian, 29 June 2014. [Booth 2014 available online]
  • Bromseth, Janne C. H., 2002, “Public Places: Public Activities? Methodological Approaches and Ethical Dilemmas in Research on Computer-mediated Communication Contexts”, in Researching ICTs in Context, Andrew Morrison (ed.), InterMedia Report 3/2002, Oslo: University of Oslo, pp. 33–61. [Bromseth 2002 available online]
  • Brothers, Kyle Bertram and Ellen Wright Clayton, 2010, “‘Human Non-Subjects Research’: Privacy and Compliance”, The American Journal of Bioethics, 10(9): 15–17. doi:10.1080/15265161.2010.492891
  • Bruckman, Amy, 2006, “Teaching Students to Study Online Communities Ethically”, Journal of Information Ethics, 15(2): 82–98. doi:10.3172/JIE.15.2.82
  • Buchanan, Elizabeth A. (ed.), 2004, Readings in Virtual Research Ethics: Issues and Controversies, Hershey, PA: Information Science Publishing.
  • –––, 2006, “Introduction: Internet Research Ethics at a Critical Juncture”, Journal of Information Ethics, 15(2): 14–17. doi:10.3172/JIE.15.2.14
  • –––, 2011, “Internet Research Ethics: Past, Present, and Future”, in Consalvo and Ess 2011: 83–108. doi:10.1002/9781444314861.ch5
  • –––, 2016, “Ethics in Digital Research”, in Handbuch Soziale Praktiken und Digitale Alltagswelten, Heidrun Friese, Gala Rebane, Marcus Nolden, and Miriam Schreiter (eds.), Wiesbaden: Springer Fachmedien Wiesbaden, pp. 1–9. doi:10.1007/978-3-658-08460-8_47-1
  • Buchanan, Elizabeth A. and Charles M. Ess, 2008, “Internet Research Ethics: The Field and Its Critical Issues”, in The Handbook of Information and Computer Ethics, Kenneth Einar Himma and Herman T. Tavani (eds.), Hoboken, NJ: John Wiley & Sons, Inc., pp. 273–292. doi:10.1002/9780470281819.ch11
  • –––, 2009, “Internet Research Ethics and the Institutional Review Board: Current Practices and Issues”, ACM SIGCAS Computers and Society, 39(3): 43–49. doi:10.1145/1713066.1713069
  • Buchanan, Elizabeth A. and Erin E. Hvizdak, 2009, “Online Survey Tools: Ethical and Methodological Concerns of Human Research Ethics Committees”, Journal of Empirical Research on Human Research Ethics, 4(2): 37–48. doi:10.1525/jer.2009.4.2.37
  • Buchanan, Elizabeth, John Aycock, Scott Dexter, David Dittrich, and Erin Hvizdak, 2011, “Computer Science Security Research and Human Subjects: Emerging Considerations for Research Ethics Boards”, Journal of Empirical Research on Human Research Ethics, 6(2): 71–83. doi:10.1525/jer.2011.6.2.71
  • Carpenter, Katherine J. and David Dittrich, 2012, “Bridging the Distance: Removing the Technology Buffer and Seeking Consistent Ethical Analysis in Computer Security Research”, in Digital Ethics: Research & Practice (Digital Formations 85), Don Heider and Adrienne Massanari (eds.), New York: Peter Lang, pp. 1–29.
  • [CASRO] Council of American Survey Research, 2011, “CASRO Code of Standards and Ethics for Survey Research”, First adopted 1977 and revised since. [CASRO code available online]
  • Chen, Jenny J., Natala J. Menezes, and Adam D. Bradley, 2011, “Opportunities for Crowdsourcing Research on Amazon Mechanical Turk”, Interfaces, 5(3). [J. Chen, Menezes, and Bradley 2011 available online]
  • Colvin, Jan and Jane Lanigan, 2005, “Ethical Issues and Best Practice Considerations for Internet Research”, Journal of Family and Consumer Sciences, 97(3): 34–39.
  • Conley, Caryn and Jennifer Tosti-Kharas, 2014, “Crowdsourcing Content Analysis for Managerial Research”, Management Decision, 52(4): 675–688. doi:10.1108/MD-03-2012-0156
  • Consalvo, Mia and Charles Ess (eds.), 2011, The Handbook of Internet Studies, Oxford: Wiley-Blackwell. doi:10.1002/9781444314861
  • Crawford, Kate and Jason Schultz, 2014, “Big Data and Due Process: Toward a Framework to Redress Predictive Privacy Harms”, Boston College Law Review, 55(1): 93–128.
  • Dittrich, David, Michael Bailey, and Sven Dietrich, 2011, “Building an Active Computer Security Ethics Community”, IEEE Security & Privacy Magazine, 9(4): 32–40. doi:10.1109/MSP.2010.199
  • Drew, David A., Long H. Nguyen, Claire J. Steves, Cristina Menni, Maxim Freydin, Thomas Varsavsky, Carole H. Sudre, M. Jorge Cardoso, Sebastien Ourselin, Jonathan Wolf, et al., 2020, “Rapid Implementation of Mobile Technology for Real-Time Epidemiology of COVID-19”, Science, 368(6497): 1362–1367. doi:10.1126/science.abc0473
  • Elgesem, Dag, 2002, “What Is Special about the Ethical Issues in Online Research?”, Ethics and Information Technology, 4(3): 195–203. doi:10.1023/A:1021320510186
  • Emanuel, Ezekiel J., Robert A. Crouch, John D. Arras, Jonathan D. Moreno, and Christine Grady (eds.), 2003, Ethical and Regulatory Aspects of Clinical Research: Readings and Commentary, Baltimore: Johns Hopkins University Press.
  • Ess, Charles, 2016, “Phronesis for machine ethics? Can robots perform ethical judgments?”, Frontiers in Artificial Intelligence and Applications, 290: 386–389. doi:10.3233/978-1-61499-708-5-386
  • Ess, Charles and the Association of Internet Researchers (AoIR) Ethics Working committee, 2002, “Ethical Decision-Making and Internet Research: Recommendations from the AoIR Ethics Working Committee”, Approved by the AoIR, 27 November 2002. [Ess and AoIR 2002 available online]
  • Eysenbach, Gunther, 1999, “Welcome to the Journal of Medical Internet Research”, Journal of Medical Internet Research, 1(1): e5. doi:10.2196/jmir.1.1.e5
  • Eysenbach, Gunther and James E. Till, 2001, “Ethical Issues in Qualitative Research on Internet Communities”, BMJ, 323(7321): 1103–1105. doi:10.1136/bmj.323.7321.1103
  • Fairfield, Joshua A., 2012, “Avatar Experimentation: Human Subjects Research in Virtual Worlds”, U.C. Irvine Law Review, 2: 695–772.
  • Federal Register, 2011, “Submission for Review and Comment: ‘The Menlo Report: Ethical Principles Guiding Information and Communication Technology Research’” (“Menlo Report”) for the Department of Homeland Security (DHS), Science and Technology, Cyber Security Division (CSD), Protected Repository for the Defense of Infrastructure Against Cyber Threats (PREDICT)”, 28 December 2011, Volume 76, Number 249, Docket No. DHS-2011-0074. [Federal Register 2011 available online]
  • –––, 2017, “Federal Policy for the Protection of Human Subjects”, 19 January 2017, Volume 82, Number 12 [Federal Register 2017 available online]
  • Fiesler, Casey, Nathan Beard, and Brian C. Keegan, 2020, “No Robots, Spiders, or Scrapers: Legal and Ethical Regulation of Data Collection Methods in Social Media Terms of Service”, Proceedings of the International AAAI Conference on Web and Social Media, 14: 187–196. [Fiesler, Beard, and Keegan 2020 available online]
  • Fiesler, Casey, Jeff Hancock, Amy Bruckman, Michael Muller, Cosmin Munteanu, and Melissa Densmore, 2018, “Research Ethics for HCI: A Roundtable Discussion”, in Extended Abstracts of the 2018 CHI Conference on Human Factors in Computing Systems, , Montreal QC Canada: ACM, 1–5. doi:10.1145/3170427.3186321
  • Fiesler, Casey and Nicholas Proferes, 2018, “‘Participant’ Perceptions of Twitter Research Ethics”, Social Media + Society, 4(1), first online 10 March 2018. doi:10.1177/2056305118763366
  • Flicker, Sarah, Dave Haans, and Harvey Skinner, 2004, “Ethical Dilemmas in Research on Internet Communities”, Qualitative Health Research, 14(1): 124–134. doi:10.1177/1049732303259842
  • Fossheim, Hallvard and Helene Ingierd (eds.), 2016, Internet Research Ethics:, Oslo: Cappelen Damm Akademisk/NOASP. doi:10.17585/noasp.3.1
  • Frankel, Mark S. and Sanyin Siang, 1999, “Ethical and Legal Aspects of Human Subjects Research in Cyberspace”, A Report of a Workshop, 10–11 June 1999, Washington, DC: American Association for the Advancement of Science. [Frankel and Siang 1999 available online]
  • Franzke, Aline Shakti, Anja Bechmann, Michael Zimmer, Charles M. Ess, and the Association of Internet Researchers (AoIR), 2020, Internet Research: Ethical Guidelines 3.0, AoIR. [Franzke et al. available online (pdf)]
  • Frauenberger, Christopher, Amy S. Bruckman, Cosmin Munteanu, Melissa Densmore, and Jenny Waycott, 2017, “Research Ethics in HCI: A Town Hall Meeting”, in Proceedings of the 2017 CHI Conference Extended Abstracts on Human Factors in Computing Systems – CHI EA ’17, Denver: ACM Press, pp. 1295–1299. doi:10.1145/3027063.3051135
  • Gaw, Allan and Michael H. J. Burns, 2011, On Moral Grounds: Lessons from the History of Research Ethics, Westerwood, Glasgow: SA Press.
  • [GDPR] General Data Protection Regulation (GDPR), 2016, “Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46”. [available online]
  • Gilbert, Brendan James, 2009, “Getting to Conscionable: Negotiating Virtual Worlds’ End User License Agreements without Getting Externally Regulated”, Journal of International Commercial Law and Technology, 4(4): 238–251. [Gilbert 2009 available online]
  • Glickman, Seth W., Sam Galhenage, Lindsay McNair, Zachry Barber, Keyur Patel, Kevin A. Schulman, and John G. McHutchison, 2012, “The Potential Influence of Internet-Based Social Networking on the Conduct of Clinical Research Studies”, Journal of Empirical Research on Human Research Ethics, 7(1): 71–80. doi:10.1525/jer.2012.7.1.71
  • Goldfarb, Norman M., 2008, “Age of Consent for Clinical Research”, Journal of Clinical Research Best Practices, 4(6). [Goldfarb 2008 available online]
  • [HHS] Health and Human Services, 2017, “Excerpts from the January 19, 2017 Revised Common Rule Preamble”. [HHS 2017 available online]
  • Hill, Kashmir, 2014, “Facebook Added ‘Research’ To User Agreement 4 Months After Emotion Manipulation Study”,, 30 June 2014. [Hill 2014 available online]
  • Hoffmann, Anna Lauren, 2016, “Facebook has a New Process for Discussing Ethics. But is It Ethical?” The Guardian, 17 June 2016. [Hoffmann 2016 available online]
  • Homeland Security Department, 2011, “Submission for Review and Comment: ‘The Menlo Report: Ethical Principles Guiding Information and Communication Technology Research’”, Federal Register: The Daily Journal of the United States Government, FR Doc. 2011-3323, 28 December 2011. [Homeland Security Department 2011 available online].
  • Hubach, Randolph D., Andrew O’Neil, Mollie Stowe, Zachary Giano, Brenda Curtis, and Celia B. Fisher, forthcoming, “Perceived Confidentiality Risks of Mobile Technology-Based Ecologic Momentary Assessment to Assess High-Risk Behaviors Among Rural Men Who Have Sex with Men”, Archives of Sexual Behavior, first online: 20 February 2020. doi:10.1007/s10508-019-01612-x
  • Hudson, James M. and Amy Bruckman, 2004, “‘Go Away’: Participant Objections to Being Studied and the Ethics of Chatroom Research”, The Information Society, 20(2): 127–139. doi:10.1080/01972240490423030
  • –––, 2005, “Using Empirical Data to Reason about Internet Research Ethics”, in ECSCW 2005: Proceedings of the Ninth European Conference on Computer-Supported Cooperative Work, 18–22 September 2005, Paris, France, Hans Gellersen, Kjeld Schmidt, Michel Beaudouin-Lafon, and Wendy Mackay (eds.), Berlin/Heidelberg: Springer-Verlag, pp. 287–306. doi:10.1007/1-4020-4023-7_15
  • Hunsinger, Jeremy, Lisbeth Klastrup, and Matthew Allen (eds.), 2010, International Handbook of Internet Research, Dordrecht: Springer Netherlands. doi:10.1007/978-1-4020-9789-8
  • Illingworth, Nicola, 2001, “The Internet Matters: Exploring the Use of the Internet as a Research Tool”, Sociological Research Online, 6(2): 79–90. doi:10.5153/sro.600 [Illingworth 2001 available online]
  • International Telecommunications Union, 2019, “New ITU Data Reveal Growing Internet Uptake but a Widening Digital Gender Divide”, ITU Media Centre, [ITU 2019 available online]
  • Jackman, Molly and Lauri Kanerva, 2016, “Evolving the IRB: Building Robust Review for Industry Research”, Washington and Lee Law Review Online, 72(3): 442–457.
  • Jacobson, David, 1999, “Doing Research in Cyberspace”, Field Methods, 11(2): 127–145. doi:10.1177/1525822X9901100204
  • Johns, Mark D., Shing-Ling Sarina Chen, and G. Jon Hall (eds.), 2003, Online Social Research: Methods, Issues, and Ethics, New York: Peter Lang.
  • Jones, Arnita, 2008, “AHA Statement on IRB’s and Oral History Research”, American Historical Association Activities, 1 February 2008. [A. Jones 2008 available online]
  • Jones, Steve (ed.), 1999, Doing Internet Research: Critical Issues and Methods for Examining the Net, Thousand Oaks, CA: Sage.
  • Kaplan, Andreas M. and Michael Haenlein, 2010, “Users of the World, Unite! The Challenges and Opportunities of Social Media”, Business Horizons, 53(1): 59–68. doi:10.1016/j.bushor.2009.09.003
  • Kerry, Cameron F., 2020, “Protecting privacy in an AI-driven world” (AI Governance), 10 February 2020, Center for Technology Innovation, Brookings Institute. [Kerry 2020 available online]
  • Keyes, Os, 2019, “Counting the Countless: Why data science is a profound threat for queer people”, Real Life, 8 April 2019. [Keyes 2019 available online]
  • King, Storm A., 1996, “Researching Internet Communities: Proposed Ethical Guidelines for the Reporting of Results”, The Information Society, 12(2): 119–128. doi:10.1080/713856145
  • Kitchin, Heather A., 2003, “The Tri-Council Policy Statement and Research in Cyberspace: Research Ethics, the Internet, and Revising a ‘Living Document’”, Journal of Academic Ethics, 1(4): 397–418. doi:10.1023/B:JAET.0000025671.83557.fa
  • –––, 2008, Research Ethics and the Internet: Negotiating Canada’s Tri-Council’s Policy, Winnipeg, Manitoba: Fernwood Publishing
  • Kramer, Adam D. I., James E. Guillory, and Jeffrey T. Hancock, 2014, “Experimental Evidence of Massive-Scale Emotional Contagion through Social Networks”, Proceedings of the National Academy of Sciences, 111(24): 8788–8790. doi:10.1073/pnas.1320040111
  • Kraut, Robert, Judith Olson, Mahzarin Banaji, Amy Bruckman, Jeffrey Cohen, and Mick Couper, 2004, “Psychological Research Online: Report of Board of Scientific Affairs’ Advisory Group on the Conduct of Research on the Internet.”, American Psychologist, 59(2): 105–117. doi:10.1037/0003-066X.59.2.105
  • Krogstad, Donald J., Samba Diop, Amadou Diallo, Fawaz Mzayek, Joseph Keating, Ousmane A. Koita, and Yéya T. Touré, 2010, “Informed Consent in International Research: The Rationale for Different Approaches”, The American Journal of Tropical Medicine and Hygiene, 83(4): 743–747. doi:10.4269/ajtmh.2010.10-0014
  • Lawson, Danielle, 2004, “Blurring the Boundaries: Ethical Considerations for Online Research Using Synchronous CMC Forums”, in Buchanan 2004: 80–100.
  • Leibovici, Didier G., Suchith Anand, Jerry Swan, James Goulding, Gobe Hobona, Lucy Bastin, Sergiusz Pawlowicz, Mike Jackson, and Richard James, 2010, “Workflow Issues for Health Mapping ‘Mashups’ of OGC”, University of Nottingham, CGS Technical Report, 2010 DL1. [Leibovici et al. 2010 available online]
  • Madejski, Michelle, Maritza Lupe Johnson, and Steven Michael Bellovin, 2011, “The Failure of Online Social Network Privacy Settings”. Columbia Research Report CUCS-010-11, Columbia University. doi:10.7916/D8NG4ZJ1
  • Mann, Chris, 2003, “Generating Data Online: Ethical Concerns and Challenges for the C21 Researcher”, in Thorseth 2003: 31–49.
  • Markham, Annette N., 1998, Life Online: Researching Real Experience in Virtual Space, Walnut Creek, CA: Altamira Press.
  • –––, 2012, “Fabrication as Ethical Practice: Qualitative Inquiry in Ambiguous Internet Contexts”, Information, Communication & Society, 15(3): 334–353. doi:10.1080/1369118X.2011.641993
  • Markham, Annette N. and Nancy K. Baym (eds.), 2008, Internet Inquiry: Conversations about Method, Thousand Oaks, CA: Sage Publications.
  • Markham, Annette, N. and Elizabeth Buchanan, 2012. : Ethical decision-making and internet research: Version 2.0. recommendations from the AoIR ethics working committee Association of Internet Researchers. [Markham and Buchanan 2012 available online]
  • Markoff, John, 2010, “In a Video Game, Tackling the Complexities of Protein Folding”, New York Times, 4 August 2010. [Markoff 2010 available online]
  • McKee, Heidi A. and James E. Porter, 2009, The Ethics of Internet Research: A Rhetorical, Case-based Process, New York: Peter Lang Publishing.
  • Mendelson, Cindy, 2007, “Recruiting Participants for Research From Online Communities”, CIN: Computers, Informatics, Nursing, 25(6): 317–323. doi:10.1097/01.NCN.0000299653.13777.51
  • Metcalf, Jacob, 2017, “‘The Study Has Been Approved by the IRB’: Gayface AI, Research Hype and the Pervasive Data Ethics Gap”. PERVADE Team: Pervasive Data Ethics for Computational Research, Report. [Metcalf 2017 available online]
  • Metcalf, Jacob, Emanuel Moss, and danah boyd, 2019, “Owning Ethics: Corporate Logics, Silicon Valley, and the Institutionalization of Ethics”, Social Research: An International Quarterly, 86(2): 449–476.
  • Milne, George R. and Mary J. Culnan, 2004, “Strategies for Reducing Online Privacy Risks: Why Consumers Read (or Don’t Read) Online Privacy Notices”, Journal of Interactive Marketing, 18(3): 15–29. doi:10.1002/dir.20009
  • Mondschein, Christopher F. and Cosimo Monda, 2018, “The EU’s General Data Protection Regulation (GDPR) in a Research Context”, in Fundamentals of Clinical Data Science, Pieter Kubben, Michel Dumontier, and Andre Dekker (eds.), Cham: Springer International Publishing, pp. 55–71. doi:10.1007/978-3-319-99713-1_5
  • Moor, James H., 1985, “What Is Computer Ethics?”, Metaphilosophy, 16(4): 266–275. doi:10.1111/j.1467-9973.1985.tb00173.x
  • Alexander, Larry and Michael Moore, 2007, “Deontological Ethics”, The Stanford Encyclopedia of Philosophy (Winter 2007 Edition), Edward N. Zalta (ed.). URL = <>
  • Narayanan, Arvind and Vitaly Shmatikov, 2008, “Robust de-anonymization of Large Sparse Datasets”, Proceedings of the 29th IEEE Symposium on Security and Privacy, Oakland, CA, May 2008, IEEE, pp. 111–125. doi:10.1109/SP.2008.33 [Narayanan and Shmatikov 2008 available online (pdf)]
  • [NCPHSBBR] The National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research, 1979, “The Belmont Report: Ethical Principles and Guidelines for the Protection of Human Subjects of Research”, Office for Human Research Protections, Department of Health and Human Services, United States. [NCPHSBBR 1979 available online]
  • [NESH] The National Committee for Research Ethics in the Social Sciences and the Humanities [Norway], 2006, “Guidelines for Research Ethics in the Social Sciences, Law, and Humanities”, Published September 2006. [NESH 2006 available online].
  • –––, 2019, “A Guide to Internet Research Ethics”. [NESH 2019 available online].
  • Nissenbaum, Helen, 2009, Privacy in Context: Technology, Policy, and the Integrity of Social Life, Stanford, CA: Stanford University Press.
  • [NSTC] National Science and Technology Council, 2016, “National Privacy Research Strategy”, Office of the President of the United States, June 2016. [NSTC 2016 available online]
  • Nuremberg Code, 1947 (1996), “The Nuremberg Code”, BMJ, 313. doi:
  • Ohm, Paul, 2010, “Broken Promises of Privacy: Responding to the Surprising Failure of Anonymization”, UCLA Law Review, 57: 1701–1777.
  • Overbeke, Grace, 2008, “Pro-Anorexia Websites: Content, Impact, and Explanations of Popularity”, Mind Matters: The Wesleyan Journal of Psychology, 3: 49–62. [Overbeke 2008 available online]
  • [PRIM&R] Public Responsibility in Medicine and Research, Bankert, E., Gordon, B., Hurley, E., and Shriver, S. (eds), 2021, Institutional Review Board: Management and Function (third edition). Burlington, MA: Jones and Bartlett.
  • Reid, Elizabeth, 1996, “Informed Consent in the Study of On-Line Communities: A Reflection on the Effects of Computer-Mediated Social Research”, The Information Society, 12(2): 169–174. doi:10.1080/713856138
  • Reynolds, Ren, and Melissa de Zwart, 2010, “The Duty to ‘Play’: Ethics, EULAs and MMOs”, International Journal of Internet Research Ethics, 3(1): 48–68. [Reynolds & de Zwart 2010 available online]
  • Ritchie, Donald A., 2003, Doing Oral History: A Practical Guide, New York: Oxford University Press.
  • Rosenberg, Åsa, 2010, “Virtual World Research Ethics and the Private/Public Distinction”, International Journal of Internet Research Ethics, 3(1): 23–37.
  • Rosser, B. R. Simon, J. Michael Oakes, Joseph Konstan, Simon Hooper, Keith J. Horvath, Gene P. Danilenko, Katherine E. Nygaard, and Derek J. Smolenski, 2010, “Reducing HIV Risk Behavior of Men Who Have Sex with Men through Persuasive Computing: Results of the Menʼs INTernet Study-II”, AIDS, 24(13): 2099–2107. doi:10.1097/QAD.0b013e32833c4ac7
  • [SACHRP] Secretary’s Advisory Committee to the Office for Human Research Protections, Unitd States Department of Health & Human Services, 2010, “SACHRP July 20–21, 2010 Meeting Presentations”.
  • –––, 2013, “Attachment B: Considerations and Recommendations concerning Internet Research and Human Subjects Research Regulations, with Revisions”, Final document approved 12–13 March 2013. (SACHRP 2013 pdf version)
  • –––, 2013, “Considerations and Recommendations Concerning Internet Research and Human Subjects Research Regulations, with Revisions”. [SACHRP 2013 available online]
  • –––, 2015, “Attachment A: Human Subjects Research Implications of ‘Big Data’ Studies”, 24 April 2015.
  • Samuel, Gabrielle and Elizabeth Buchanan, 2020, “Guest Editorial: Ethical Issues in Social Media Research”, Journal of Empirical Research on Human Research Ethics, 15(1–2): 3–11. doi:10.1177/1556264619901215
  • Scholz, Trebor, 2008, “Market Ideology and the Myths of Web 2.0”, First Monday, 13(3): 3 March 2008. [Scholz 2008 available online]
  • Schwartz, Paul M. and Daniel J. Solove, 2011, “The PII Problem: Privacy and a New Concept of Personally Identifiable Information”, New York University Law Review, 86(6): 1814–1893.
  • Seaboldt, James A. and Randy Kuiper, 1997, “Comparison of Information Obtained from a Usenet Newsgroup and from Drug Information Centers”, American Journal of Health-System Pharmacy, 54(15): 1732–1735. doi:10.1093/ajhp/54.15.1732
  • Sharf, Barbara F., 1997, “Communicating Breast Cancer On-Line: Support and Empowerment on the Internet”, Women & Health, 26(1): 65–84. doi:10.1300/J013v26n01_05
  • Sieber, Joan E., 1992, Planning Ethically Responsible Research: A Guide for Students and Internal Review Boards, Thousand Oaks, CA: Sage.
  • –––, 2015, Planning Ethically Responsible Research: A Guide for Students and Internal Review Boards, second edition, Thousand Oaks, CA: Sage.
  • Simmhan, Yogesh, Roger Barga, Catharine van Ingen, Ed Lazowska, and Alex Szalay, 2008, “On Building Scientific Workflow Systems for Data Management in the Cloud”, in 2008 IEEE Fourth International Conference on EScience, Indianapolis, IN: IEEE, pp. 434–435. doi:10.1109/eScience.2008.150
  • Simmhan, Yogesh, Catharine van Ingen, Girish Subramanian, and Jie Li, 2009, “Bridging the Gap between the Cloud and an EScience Application Platform”. Microsoft Research Tech Report MSR-TR-2009-2021. [Simmhan et al. 2009 available online]
  • Skloot, Rebecca, 2010, The Immortal Life of Henrietta Lacks, New York: Crown Publishers.
  • Sloan, Luke, Curtis Jessop, Tarek Al Baghal, and Matthew Williams, 2020, “Linking Survey and Twitter Data: Informed Consent, Disclosure, Security, and Archiving”, Journal of Empirical Research on Human Research Ethics, 15(1–2): 63–76.
  • Smith, Michael A. and Brant Leigh, 1997, “Virtual Subjects: Using the Internet as an Alternative Source of Subjects and Research Environment”, Behavior Research Methods, Instruments, & Computers, 29(4): 496–505. doi:10.3758/BF03210601
  • Spriggs, Merle, 2010, A Handbook for Human Research Ethics Committees and Researchers: Understanding Consent in Research Involving Children: The Ethical Issues, Melbourne: The University of Melbourne/Murdoch Childrens Research Institute/The Royal Children’s Hospital Melbourne, version 4. Spriggs 2010 available online]
  • Stahl, Bernd Carsten and David Wright, 2018, “Ethics and Privacy in AI and Big Data: Implementing Responsible Research and Innovation”, IEEE Security & Privacy, 16(3): 26–33. doi:10.1109/MSP.2018.2701164
  • Sveningsson, Malin, 2004, “Ethics in Internet Ethnography” in Buchanan 2004: 45–61.
  • Sweeney, Latanya, 2002, “K-Anonymity: A Model for Protecting Privacy”, International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(5): 557–570. doi:10.1142/S0218488502001648
  • Thomas, Jim, 2004, “Reexamining the Ethics of Internet research: Facing the Challenge of Overzealous Oversight”, in Johns, Chen, and Hall 2004: 187–201.
  • Thorseth, May (ed.), 2003, Applied Ethics in Internet Research (Programme for Applied Ethics Publication Series No. 1), Trondheim, Norway: NTNU University Press.
  • Tsai, Janice, Lorrie Faith Cranor, Alessandro Acquisti, and Christina M. Fong, 2006, “What’s It To You? A Survey of Online Privacy Concerns and Risks”. NET Institute Working Paper No. 06–29. doi:10.2139/ssrn.941708
  • Turkle, Sherry,1997, Life on the Screen: Identity in the Age of the Internet, New York: Touchstone.
  • Van Heerden, Alastair, Doug Wassenaar, Zaynab Essack, Khanya Vilakazi, and Brandon A. Kohrt, 2020, “In-Home Passive Sensor Data Collection and Its Implications for Social Media Research: Perspectives of Community Women in Rural South Africa”, Journal of Empirical Research on Human Research Ethics, 15(1–2): 97–107. doi:10.1177/1556264619881334
  • Vitak, Jessica, Nicholas Proferes, Katie Shilton, and Zahra Ashktorab, 2017, “Ethics Regulation in Social Computing Research: Examining the Role of Institutional Review Boards”, Journal of Empirical Research on Human Research Ethics, 12(5): 372–382. doi:10.1177/1556264617725200
  • Walstrom, Mary K., 2004, “Ethics and Engagement in Communication Scholarship: Analyzing Public, Online Support Groups as Researcher/Participant-Experiencer”, in Buchanan 2004: 174–202.
  • Walther, Joseph B., 2002, “Research Ethics in Internet-Enabled Research: Human Subjects Issues and Methodological Myopia”, Ethics and Information Technology, 4(3): 205–216. doi:10.1023/A:1021368426115
  • White, Michele, 2002, “Representations or People?”, Ethics and Information Technology, 4(3): 249–266. doi:10.1023/A:1021376727933
  • World Medical Association, 1964/2008, “Declaration of Helsinki: Ethical Principles for Medical Research Involving Human Subjects”. Adopted by the 18th World Medical Assembly. Amended 1975, 1983, 1989, 1996, 2000, 2002, 2004, 2008. [Declaration of Helsinki available online]
  • Wright, David R., 2006, “Research Ethics and Computer Science: An Unconsummated Marriage”, in Proceedings of the 24th Annual Conference on Design of Communication: SIGDOC ’06, Myrtle Beach, SC: ACM Press, pp. 196–201. doi:10.1145/1166324.1166369
  • Zimmer, Michael T., 2010, “‘But the Data Is Already Public’: On the Ethics of Research in Facebook”, Ethics and Information Technology, 12(4): 313–325. doi:10.1007/s10676-010-9227-5
  • –––, 2016, “OkCupid Study Reveals the Perils of Big-Data Science”,, 14 May 2016. [Zimmer 2016 available online]
  • Zimmer, Michael and Edward Chapman, 2020, “Ethical Review Boards and Pervasive Data Research: Gaps and Opportunities”, Paper presented at AoIR 2020: The 21st Annual Conference of the Association of Internet Researchers. [Zimmer and Chapman 2020 extended abstract available online (pdf)]
  • Zimmer, Michael T. and Katharina Kinder-Kurlanda (eds.), 2017, Internet Research Ethics for the Social Age: New Challenges, Cases, and Contexts, New York: Peter Lang Publishing.

Other Internet Resources

Cited in Entry

Laws and Government Documents

United States


Professional Standards

Journals, Forums, and Blogs

Other Resources

Copyright © 2021 by
Elizabeth A. Buchanan <>
Michael Zimmer <>

Open access to the SEP is made possible by a world-wide funding initiative.
The Encyclopedia Now Needs Your Support
Please Read How You Can Help Keep the Encyclopedia Free