Search Engines and Ethics
What is an Internet search engine? Why are search engines problematic from an ethical perspective? In this entry, the available philosophical literature on this topic will be critically reviewed. However, relatively few academic works on the topic of search engines have been written from a philosophical perspective. And only a handful of the existing publications that focus specifically on the ethical aspects of search engines have been contributed by philosophers (see, for example, Nagenborg 2005).
- 1. Introduction and Overview
- 2. Search Engine Development and Evolution: A Short History
- 3. Ethical Implications
- 4. Conclusion
- Academic Tools
- Other Internet Resources
- Related Entries
It may be difficult to imagine today's world without search engines. Which high school student—at least one living in North America or Europe—has not used a Web search engine to query about some topic or subject? Of course, it is quite possible that many Internet users, both young and old, do not consciously distinguish between the search engines they use and Web browsers that now also typically include search engines as a feature within their user interface. But virtually all Internet users have come to expect and depend on the instantaneous results they receive in response to their various search queries. While there is no shortage of definitions of “search engine,” none has been accepted as the standard or universally agreed upon definition. For purposes of this entry, however, the definition of a (Web) search engine, put forth by Halavais (2009, 5–6), is “an information retrieval system that allows for keyword searches of distributed digital text.” We note that this definition includes some important technical terms and concepts that, in turn, need defining and further elucidation. Our examination of key technical concepts underlying search engines is intended to provide a useful context for our analysis of the ethical implications. In this sense, Blanke (2005, 34) is correct that an adequate analysis of the ethical aspects of search engines “requires knowledge about the technology and its functioning.”
We begin with a brief sketch of the history and evolution of search engines, from their conception in the pre-Internet era to the development and implementation of contemporary (“Web 2.0” era) search engines such as Google. Our examination of important historical developments of this technology is intended to address our first question, noted above: “What is a search engine?” It also provides a backdrop for analyzing our second major question, “Why are search engines problematic from an ethical perspective?” where a cluster of ethical concerns involving search engine technology is examined. These include issues ranging from search engine bias and the problem of opacity/non-transparency, to concerns affecting privacy and surveillance, to a set of issues involving censorship and democracy. As a sub-area of applied ethics, and more specifically of information/computer ethics, we note that the work that has been done thus far regarding search engines has been carried out within a broadly deontological approach. In keeping with this ethical perspective, we limit ourselves to a deontological analysis of the cluster of ethical issues surrounding search engines, and for the purposes of this entry put aside possible shortcomings internal to, as well as in the application of, deontology. Finally, in the concluding section, we identify and briefly describe two additional issues: (1) moral/social-responsibilities that search engine companies may have in light of their “privileged place” in society (Hinman 2005); and (2) broader philosophical issues (especially in the area of epistemology) that are non-ethical in nature. However, an adequate analysis of these issues is beyond the scope of this entry.
Because search engines provide Internet users with access to important information by directing them to links to available online resources on a plethora of topics, many are inclined to see search engine technology in a positive light; some might also assume that this technology is “value-neutral.” However, search engines can raise a number of ethical controversies. Before examining these controversies, however, we first briefly discuss the history of search engine technology via categories that, for our purposes, reflect four distinct eras: (i) Pre-Internet, (ii) Internet (pre-Web), (iii) early Web, and (iv) Web 2.0. We will see how technical developments in each era have had some implications for the cluster of ethical issues examined in Section 3.
Today, we tend to associate search engines with computer technology, and perhaps more specifically with Internet-based computing and electronic devices. Yet, the early work in search/information retrieval systems was carried out independently of developments in electronic computing. Whereas the first (general purpose) electronic computer—the ENIAC (Electronic Numerical Integrator And Computer)—was announced in February 1946, several decades would pass before Internet search engines became available. Because ENIAC and other early computers were designed primarily to “crunch numbers,” relatively little thought had been given to the kinds of information-retrieval systems that could be used to search through the large amount of data that those non-networked (or “stand-alone”) computers were capable of storing. However, some information theorists had begun to worry about the amount of information that was becoming available during this period and that, in all likelihood, would proliferate with the advent of computers. In particular, they were concerned about how an ever-expanding repository of information could be organized and retrieved in a practical way. Halavais (2009, 13) notes that early computers “drew on the ideas of librarians and filing clerks” for arranging the stored information that would be retrieved. But some of the leading thinkers in the emerging field of information retrieval (or IR), which Van Couvering (2008) describes as a “hybrid” academic discipline combining elements of information science and computer science, saw that traditional methods for retrieving information would not be effective in the era of electronic computer systems.
One visionary who saw the need for a new kind of organizing and retrieval scheme to manage the expanding volume of information was Vannevar Bush, perhaps the most important figure in the history of information-retrieval/search-engine theory in the pre-Internet era. In his classic article, “As We May Think” (Atlantic Monthly, July 1945), published appropriately six months before ENIAC's official announcement, Bush remarked,
The summation of human experience is being expanded at a prodigious rate, and the means we use for threading through the consequent maze to the momentarily important item is the same as was used in the days of square-rigged ships.
However, Bush believed that a technological solution to this problem was possible through a system he called memex, which he described as a
device in which an individual stores all his books, records, and communications, and which is mechanized so that it can be consulted with exceeding speed and flexibility.
Bush envisioned the memex behaving like an “intricate web of trails” similar to the function of the human mind, which he believed works by a method of “association” and not via an alphabetical index (of the kind typically used in libraries and other cataloging schemes). According to Levy (2008, 508), the most “innovative feature” of Bush's memex system was the establishing of
associative indices between portions of microfilmed text—what we now call hypertext links—so that researchers could follow trails of useful information through masses of literature.
Via Bush's “associative indexing” scheme, different pieces of information could be linked or tied together, “as any single item may be caused at will to select immediately and automatically another.” Thus, Bush is often credited with having anticipated the kinds of search engine functions that would eventually be used on the Internet and the World Wide Web.
Two other important figures in the history of search engine theory who made significant contributions during the pre-Internet era, were Gerald Salton and Ted Nelson. Salton, who some consider the “father of modern search technology,” developed the SMART (Salton's Magic Automatic Retriever of Text) information retrieval system. And Nelson, who developed hypertext in 1963, significantly influenced search engine theory through his Project Xanadu (Wall 2011). Although several years passed before Salton's and Nelson's contributions could be incorporated into modern search engines, it is worth noting that some very “primitive” search functions had been built into the operating systems for some pre-Internet-era computers. For example, Halavais points out that the UNIX operating system supported a search utility called “Finger.” Via the Finger command, a UNIX user could search for one or more users who also had active accounts on a particular UNIX system. To inquire about a UNIX user named “Jones,” for example, one could simply enter the command “Finger Jones” at the user prompt on the command line. However, this search function was very limited, since the only kind of information that could be retrieved was information about whether one or more users were currently logged into the system and about which time those users logged in/out. But, as Halavais points out, this rudimentary search facility also enabled UNIX users to arrange limited social gatherings—e.g., users could “Finger” one another to set up a time to play tennis after work (provided, of course, that the users were logged into their UNIX accounts at that time).
Some of the conceptual/technological breakthroughs that occurred during the pre-Internet era of search engine development made possible two kinds of ethical issues examined in Section 3. For example, Bush's “associative indexing” scheme for retrieving information, as opposed to more traditional cataloging schemes based on straight-forward inferential rules and techniques, enabled (perhaps unintentionally) some of the kinds of “bias” and opacity/non-transparency-related concerns affecting users' search results that we examine in Section 3.1. For another thing, the kind of search function made possible by the UNIX “Finger” utility, enabling UNIX users to retrieve information about the availability of fellow UNIX users and to acquire information about which times those users logged into and logged out from the system, generated some privacy-and-monitoring-related concerns that are included among the ethical issues we examine in Sections 3.2 and 3.3.
By the 1960s, plans for developing a vast network of computer networks (i.e., what was eventually to become the Internet) were well underway. And by 1970, work had begun on the ARPANET (Advanced Research Projects Agency Network), which is commonly viewed as the predecessor of the Internet. This US-based project was funded by DARPA (Defense Advanced Research Projects Agency) into the late 1980s, when the National Science Foundation Network (NSFnet) took over the project (Spinello 2011). Although multiple computer networks existed during this period, they were not easily able to communicate and exchange data with one another; a common protocol was needed for the various networks to exchange data between systems. The Transmission Control Protocol/Internet Protocol (TCP/IP) architecture was eventually selected as the standard protocol for the newly emerging Internet. With the implementation of this new standard, there was considerable optimism about the potential for sharing the data that resided in the various computers systems comprising the fledgling Internet. However, one very important challenge still remained: How could Internet users locate the rich resources potentially available to them? To do this, a sophisticated search program/utility, with a robust indexing system, was needed to point to the available computer databases that existed and to identify the content that resided in those databases. The first indexes on the Internet were fairly primitive, and as Halavais (2009) points out, “had to be created by hand.”
With TCP/IP now in place, privately owned computer networks—including LANs (local area networks) and WANs (wide area networks)—were able to communicate with one another and, in principle at least, also able to exchange vast amounts of information over the network. However, another protocol—one that would be layered on top of TCP/IP—was needed to accomplish this objective. So, FTP (File Transfer Protocol), a client/server-based system, was developed and implemented in response to this need. To exchange or share files with a fellow Internet user in this scheme, one first had to set up an FTP server. Users could then upload files to and retrieve them from an FTP server, via an FTP client. Perhaps more importantly, they could also now effectively search for files with one of the newly developed search engines, the first of which was called ARCHIE.
The ARCHIE search engine enabled users to enter queries based on a limited set of features—mainly “file names.” ARCHIE's searchable database of file names was comprised of the file directory listings of hundreds of systems available to public FTP servers (and eventually to “anonymous” FTP servers as well). In the early 1990s, two other search engines were also fairly prominent: VERONICA (Very Easy Rodent-Oriented Net-Wide Index to Computer Archives) and JUGHEAD (Jonzy's Universal Gopher Hierarchy Excavation and Display). Both VERONICA and JUGHEAD had an advantage over the ARCHIE search engine in that they were able to search for plain-text files, in addition to searching for file names. These two search engines also worked in connection with a system called GOPHER. According to Halavais (2009, 22), GOPHER's “menu-driven approach” to search helped to bring “order to the Internet,” since users “could now navigate through menus that organized documents.”
Some of the technological breakthroughs that occurred during the early-Internet era of search engine development exacerbated a privacy-related ethical issue examined in Section 3. Specifically, Internet-wide search functions, enabled by compliance with the TCP/IP and FTP protocols, dramatically increased the scope of the privacy-and-monitoring concerns (initially generated in the pre-Internet era via applications such as the UNIX “Finger” utility) that we examine in Sections 3.2 and 3.3. Additionally, “anonymous” FTP servers, also developed in this period, made it possible for technically-savvy users to upload proprietary files, such as copyrighted software applications, on to the Internet (with anonymity). And the indexing schemes supported by the ARCHIE and GOPHER search systems enabled users to search for and download/share those proprietary files with relative anonymity. Although intellectual property issues are not included among the ethical concerns examined in Section 3, it is worth noting that the development of some search-engine-related applications during this era paved the way for the kinds of illegal file-sharing practices involving copyrighted music that arose in connection with the Napster site in the late 1990s.
The first Web site was developed in 1991 (at the CERN European laboratory for particle physics) by Tim Berners-Lee, who also founded the World Wide Web Consortium (W3C) at MIT in 1994. The World Wide Web was based on the Hyper Text Transfer Protocol (HTTP) and used a format called the Hyper Text Markup Language (HTML) for designing and delivering documents; many non-technical users found navigating the Web to be much more friendly and versatile than using GOPHER and FTP to exchange files. For the (HTTP-based) Web to realize its full potential and to become attractive to non-technical users, however, a more intuitive user interface was needed. The Mosaic Web browser (later called Netscape Navigator) became available in 1993 and was the first Internet application to include a graphical user interface (GUI); this interface, with its intuitive features that enabled users to click on hyperlinks, made navigating the Web much easier for non-technical users. Although Netscape Navigator was a Web browser, and not a search engine, it provided a forum in which many specialized Web search engine companies were able to flourish. A host of search engines, most of which were dedicated to specific areas or specific kinds of searches, soon became available. Some search engines that were especially popular during this period were Excite (introduced in 1993) and Lycos and Infoseek (both of which were available in 1994). Others included Looksmart and Alta Vista, introduced in 1995, and Ask.com (originally called AskJeeves) in 1997 (Wall 2011).
Although the internal structure of a search engine is fairly complex—comprising, among other components, programs called “spiders” that “crawl” the Web—the user-interface portion of the search process is quite straightforward and can be summarized in terms of two steps: (1) a user enters search term/phrase or “keyword” in a “search box”; and (2) the search engine returns a list of relevant Web “pages” that typically include hyperlinks to the pages listed. Many of the early Web search engines were highly specialized and thus could be viewed as “vertical” (i.e., in current technical parlance regarding search engine technology) in terms of their scope. For example, Ask.com was designed to accept queries in the form of specific questions and thus could be viewed as a vertical search engine. Halavais defines a vertical search engine as one that limits itself “in terms of topic, medium, region, language, or some other set of constraints, covering that area in greater depth.” (In this sense, vertical search engines are far more capable of drilling down into particular topics than expanding out into associated subjects.) A few of the popular search engines that flourished during the early Web period, however, were more general, or “horizontal,” in nature. Alta Vista, for instance, was one of the first search engines to fit into this category. Today, most of the major search engines are horizontal, and Google is arguably the best known horizontal search engine. We should note, however, that vertical search engines still play an important role today. Consider an example where one uses Google, or an alternative horizontal search engine such as Yahoo! or (Microsoft's) Bing, to locate the Web site for Bates College. Once the user has successfully accessed the main page on the Bates site she can then use Bates' local search facility, a vertical search engine, to retrieve information about faculty and staff who work at that college, or retrieve information about various academic programs and co-curricular activities sponsored by that college, and so forth. Within that vertical search engine, however, the user cannot retrieve broader information about faculty and academic programs at related colleges and universities or about related topics in general (as they could when using a horizontal search engine).
Another type of Web search engine is a meta search engine, which, as its name suggests, draws from the results of multiple (specialized) search engines and then combines and re-ranks the results. One of the first, and perhaps most popular, meta search engines in the mid-to-late 1990s was HotBot (Wall 2011). Meta search engines had a much more important role to play during the early years of the Web. As search engines improved and became more sophisticated, the need for meta search dramatically declined. Today, most general purpose (horizontal) search engines, such as Google and Bing, are able to return the same level of ranked results (as meta search engines once did), via their aggregation schemes. In fact, the founders of Google have described their search engine as an “aggregator of information.”
Some of the technological breakthroughs that occurred during the “early Web” era of search engine development helped to make possible two kinds of privacy-related ethical issues examined in Section 3. First, the vast amount of online information about ordinary people that became accessible to Web-based search engines during this era made it possible for those people to become the “targets” of online searches conducted by anyone who had access to the Internet; this concern is examined in Section 3.2. Second, the practice of aggregating personal information, which was being routinely collected by major search engine companies and their advertisers, contributed significantly to the data-mining-related privacy issues that are examined in Section 3.3.
Although the expression “Web 2.0” is now commonly used to differentiate the current Web environment from the early Web, critics point out that this expression is somewhat vague or imprecise. Whereas the early Web (sometimes referred to as “Web 1.0”) has been described as an online environment that was essentially passive or static, in so far as one could simply view the contents of a Web site that had been set up by an organization or an individual (e.g., when one visited someone's “home page”), Web 2.0 is more dynamic in that it supports many interactive or “participatory” features. In a Web 2.0 environment, for example, users can interact and collaborate with others in ways that were not previously possible. These collaborative features include wikis (with Wikipedia being the best know example), as well as blogs and social networking applications (such as Facebook and Twitter). Of course, the relevant question for us to consider is whether the Web 2.0 environment itself either changes or significantly affects the functions of search engines and, more importantly, the ethical issues they generate.
It is not clear whether we can accurately label current search engines as “Web 2.0 search engines,” even though they operate in a Web 2.0 environment. For example, many of the participatory tools and functions that apply to applications such as social networks, blogs, and wikis do not necessarily apply to contemporary search engines. So, it may be more appropriate to use Hinman's phrase “second-generation search engines.” However, O'Reilly (2005) suggests that Google's practice of incorporating user-generated content to provide a “better” Web search environment for users is compatible with interactive dimensions and objectives of Web 2.0. But despite O'Reilly's interpretation of Google's practices, it is still not clear to the present author that the phrase “Web 2.0 search engines” is warranted; so, we will instead refer to contemporary (or second-generation) search engines as “Web 2.0-era search engines.”
What, exactly, distinguishes a Web 2.0-era search engine from the earlier ones? Hinman notes that the traditional criteria Web search engine companies used to rank sites was based on two factors: (1) the number of visits to a page (i.e., “popularity”), and (2) the “number of other pages that link to a given page.” With respect to the second criterion, Diaz (2008) and others point to an analogy used in ranking the importance of academic papers. They note, for example, that an academic paper is generally viewed to be important if it is cited by many other papers. And that paper is perhaps viewed as even more important if it is cited by highly cited works. Hinman believes that the shift to (what we call) Web 2.0-era search engines occurred when companies, such as Google, “looked more closely at what users wanted to find” (which, as he also points out, is not always the most popular site). He notes, for example, that Google's formula employs the following strategy: “Users' needs → Search terms → Desired site” (Hinman 2005, 22). He also notes that in this scheme,
what the user wants becomes an integral part of the formula, as does the set of search terms commonly used to express what the user wants.
Hinman and others credit Google's success as the premier contemporary Web search engine to the company's proprietary algorithm, called PageRank.
Zimmer (2008, 77) believes that Google's ultimate goal is to “create ‘the perfect search engine’ that will provide only intuitive, personalized, and relevant results.” Halpern (2011) points out that the search process has already “become personalized”—i.e., “instead of being universal, it is idiosyncratic and oddly peremptory.” And Pariser (2011), who asserts that “there is no standard Google anymore,” also notes that with “personalized search,” the result(s) suggested by Google's algorithm is probably the best match for the search query. Some ethical implications affecting the personalization of search algorithms are examined in Section 3.4.
Most Internet users are well aware of the virtues of search engines. As we noted earlier, many of us now depend on them to direct us to links to useful information that affect nearly all facets of our day-to-day lives—information about work, travel, recreation, entertainment, finances, politics, news, sports, music, education, and so forth. However, as we also noted earlier, the use of search engines has generated a cluster of ethical concerns. We organize these concerns into four broad categories: (i) search-engine bias and the problem of opacity/non-transparency, (ii) personal privacy and informed consent, (iii) monitoring and surveillance, and (iv) censorship and democracy.
What is search-engine bias, and why is it controversial? In reviewing the literature on this topic, it would seem that the phrase “search-engine bias” has been used to describe at least three distinct, albeit sometimes overlapping, concerns: (1) search-engine technology is not neutral, but instead has embedded features in its design that favor some values over others; (2) major search engines systematically favor some sites (and some kind of sites) over others in the lists of results they return in response to user search queries; and (3) search algorithms do not use objective criteria in generating their lists of results for search queries.
3.1.1 The Non-Neutrality of Search Engines
While some users may assume that search engines are “neutral” or value-free, critics argue that search engine technology, as well as computer technology in general, is value-laden and thus biased because of the kinds of features typically included in their design. For example, Brey (1998, 2004) and others (see, for instance, Friedman and Nissenbaum 1996) have argued that computer technology has certain built-in features that tend to favor some values over others. Brey worries that some of these technological features have embedded values that are “morally opaque.” Because the values embedded in these features are not always apparent to the technical experts who develop computer systems, Brey believes that a methodological framework, which expands upon what he calls the “standard” applied-ethics model typically used in “mainstream computer ethics,” is needed to identify or disclose the “hidden” values at the design stage. He refers to this model as “disclosive computer ethics” (Brey 2004, 55–56).
Identifying the human values embedded in technological design and development has been the main objective of a movement called Value Sensitive Design or VSD, which Friedman, Kahn and Borning (2008, 70) define as a
theoretically grounded approach to the design of technology that accounts for human values in a principled and comprehensive manner throughout the design process.
Friedman et al. illustrate their model using the example of Internet cookies—i.e., small text files that a Web browser places on a user's computer system for the purposes of tracking and recording that user's activities on a Web site. In particular, they examine the design of cookies in connection with the informed-consent process vis-à-vis Web browsers. The authors argue that embedded features in the design of cookies challenge the value of informed consent and that this value is important because it protects other values such as privacy, autonomy, and trust.
Cookies technology is not only embedded in the design of contemporary Web browsers, it is also used by major search engine companies to acquire information about users. In so far as these companies place cookies on users' computer systems, without first getting their consent, they also seem to contribute to, and perhaps even exacerbate, at least one kind of technology-related bias—i.e., one that threatens values such as privacy and autonomy, while favoring values associated with surveillance and monitoring. However, since this kind of bias also applies to design issues affecting Web browsers, it is not peculiar to search engines per se.
3.1.2 The Manipulation of Search Results
Many critics tend to view the schemes used to manipulate search results as the paradigm case of bias in the context of search engines. In an influential paper on this topic, Introna and Nissenbaum (2000) claimed that search engines
systematically exclude certain sites and certain types of sites, in favor of others, systematically giving prominence to some at the expense of others.
There has been considerable speculation as to why this is the case, but we briefly examine two reasons that have been prominent in discussions in the literature: (a) the interests of advertisers who sponsor search engines; and (b) schemes used by technically-savvy individuals and organizations to manipulate the ordered ranking of sites returned by search engines. (A third reason that is also sometimes suggested has to do with the nature of the algorithms used by major search engine companies; that issue is examined in this Section 3.1.3).
126.96.36.199 Online Advertising Strategies and Search Bias
Some critics point out that search engine companies are “answerable” to the paid advertisers who sponsor them. So, for many of these critics, bias-related concerns affecting the inclusion/exclusion of certain sites can be attributable mainly to the interests of paid advertisers. Google founders Brin and Page (1998, 18), who initially opposed the idea of paid advertisement on search engines, noted that it would seem reasonable to
expect that advertising funded search engines will be inherently biased towards the advertisers and away from the needs of consumers… Since it is very difficult even for experts to evaluate search engines, search engine bias is particularly insidious…[and] less blatant bias are likely to be tolerated by the market.
It is worth noting that advertising schemes used by search engines have evolved over time. For example, Diaz (2008, 21) notes that banner ads, which were common on the Internet during the Web 1.0 era, have been replaced by “paid placement of ads.” He also notes that “paid listings” (unlike the earlier banner ads) do not always look very different from the normal results returned to users. Elgesem (2008) points out that search engines such as GoTo, whose lists of search results were based entirely on “paid hits” from advertisers, allegedly failed because of user dissatisfaction with the results they received. Eventually, however, GoTo was taken over by Google, which continued to use GoTo's method for generating paid-ad-based search results but physically separated those results from the “organic” results that appear on the center of its pages (Elgesem 2008); this scheme, which contrasts the two different kinds of results, seems to have been accepted by Google users.
Diaz describes two other kinds of bias-related concerns that affect advertising schemes used by search engine companies: (i) the arbitrary (and seemingly inconsistent) criteria used by these companies in accepting advertisements, and (ii) the imprecise (and sometimes confusing) criteria used to separate editorials (approved by a search engine company) from their paid advertisements. (Although the kind of discrimination with regard to the ads that Google accepts or rejects may seem arbitrary in an innocuous sense, Diaz notes that Google has also refused to accept ads from some organizations that have been critical of corporations that were already sponsors for Google.) Regarding (ii), Diaz explains how bias-related concerns affecting paid advertisements can sometimes be confused with editorials that are also displayed by search engine companies on their Web pages. Whereas editorials and ads may look very different in newspapers, Diaz notes that this is not always the case in search engines.
Some critics assume that as conflicts affecting online advertisements in the context of search engines are eventually resolved, the amount of bias in search engine results will also decline or perhaps disappear altogether. However, other schemes can also be used to influence the privileging of some sites over others, in terms of both their inclusion (vs. exclusion) and their ranking.
188.8.131.52 Technological Schemes Used to Manipulate Search Results
We have already noted that some technically-savvy individuals and organizations have figured out various strategies for “gaming the system”—i.e., positioning their sites higher in the schemes used by search engine companies to generate results. These schemes are commonly referred to by “insiders” as instances of SEO, or Search Engine Optimization, and we briefly consider what can now be regarded as a classic SEO ploy. Some organizations and individuals had effectively used HTML meta tags and keywords (embedded in HTML source code) to influence search engine companies to give their sites higher rankings in the ordering schemes for their respective categories of search. Eventually, however, search engine companies recognized the manipulative aspects of these HTML features and began to disregard them in their ranking algorithms (Goldman 2008).
Many organizations now use a different kind of strategy to achieve a higher ranking for their sites—one that takes advantage of the (general) formulas currently used by major search engine companies. Blanke (2005, 34) notes that in Google's algorithm, Web pages “achieve a better ranking if they optimize their relationship within the system of hubs and authorities.” Whereas “authorities” are Web pages that are “linked by many others,” hubs “link themselves to many pages.” Diaz (2008) points out that highly referenced hubs will have the highest Page Ranks. So Web site owners and designers who know how to exploit these factors (as well as how to use various SEO-related schemes) to manipulate ranking in search engine results will have the highest ranked sites. Diaz also notes that these sites “tend to belong to large, well-known technology companies such as Amazon and eBay,” while “millions of typical pages…will have the lowest ranking.” Thus, it would seem that the kind of search engine bias identified by Introna and Nissenbaum that, whereby certain Web sites are systematically included/excluded in favor of others, will not necessarily be eliminated simply by resolving conflicts related to paid advertising in the context of search engine companies. In all likelihood, organizations will continue to figure out ways to use SEO-related techniques to achieve a higher ranking for their sites and thus gain better exposure on search engines, especially on Google. As Hinman puts it, “Esse est indicato in Google (to be is to be indexed on Google).”
Concerns affecting search engine bias vis-à-vis questions having to do with “objectivity” have two distinct aspects: (A) objectivity regarding criteria used in search algorithms; and (B) objectivity with respect to the results returned by a particular search engine in its responses to multiple users entering the same search query. With regard to (A), we saw that the traditional criteria used by Web search engine companies to rank sites was based on two factors: the number of visits to a page, and the number of other pages that link to a given page; Hinman (2005) believes that this technique would seem to give the user “some semblance of objective criteria.” He points out, for example, that even if search engines were to “get it wrong” in returning the best results for a particular search query, there was an “objective fact of the matter to be gotten wrong.” And even though the early search engines ranked their sites in terms of “popularity,” there was, as Hinman (2005, 22) puts it, a “technical and objective meaning” of popularity. But we also saw that this formula has changed dramatically in the case of Web 2.0-era search engines, where increasingly “personalized algorithms” tend to tailor search results to fit the profile of the user entering the query.
This trend toward the “personalizing” of algorithms feeds directly into concerns affecting (B). Even though many sophisticated users might suspect that the lists of returns for their search queries are biased, for any number of possible reasons—e.g., paid advertising, unfair influence by large corporations, editorial control, and so forth—many search engine users still tend to assume, perhaps naïvely, that when any two users enter the exact same search query in a major search engine such as Goggle, they would receive identical lists of responses. In other words, even if the formula used is skewed, or biased in a way that favors some sites over others, the search algorithm would nonetheless return results based on a formula that is internally consistent, and thus standard or “objective” in some sense. But, this is no longer the case in an era where formulas based on “personalization” generate search results tailored to a user's profile. For example, if I enter the term “eagles” in a search box, the list and the order of returns that I receive will likely depend on the profile that the search engine company has constructed about me. If the company determines that I am interested in biology, for instance, I may be directed to a site sponsored by the Audubon Society. But, if instead, it determines that I am a sports enthusiast living in the Philadelphia area, I may be directed first to the Philadelphia Eagles Web site (or to a related professional football site). On the contrary, if my profile suggests that I like rock/pop music, I may be directed first to the site for The Eagles music group. So there would not appear to be any overall “objective” formula used by the search engine in question.
Some question whether a lack of objectivity with respect to the results returned to users in response to their search queries is necessarily a problem. For example, Blanke (2005, 34) believes that we should “neither demand nor expect to receive information from search engines that is objective” (i.e., information that is either “neutral or complete”). So, he argues that any critique of search engines as being biased “misses its target,” because one should not expect search engines to deliver only neutral and objective results. His rationale for this claim, however, seems to be based on the view that search engine technology “was not designed to do this.” But this analysis seems to beg the question, unless, of course, Blanke means that search engine technology could not, in principle, be designed to deliver neutral and objective results.
Questions concerning objectivity in the context of search engines are also examined by Goldman (2008), who seems to defend—and, at times, perhaps even applaud—search engine bias in that respect. First, he notes that search engine companies “make editorial judgments about what data to collect and how to present that data” (2008, 122). However, he also believes that search engine bias is “necessary and desirable”—i.e., it is “the unavoidable consequence of search engines exercising editorial control over their databases” (p. 127). So he is willing to concede that “search engine companies, like all other media companies, skew results.” But while many assume that search engine bias is undesirable, Goldman sees it as a “beneficial consequence of search engines optimizing content for their users.” He also believes that “the ‘winner takes all’ effect caused by top placement in search results will be mooted by emerging personalized search technology” (p. 121). He further argues that “personalized ranking algorithms” will “reduce the effect of search engine bias because there will likely be multiple ‘top’ search results of a particular search term instead of a single winner [and] personalized algorithms will eliminate many of the current concerns about search engine bias” (p. 130). Thus Goldman seems to suggest, paradoxically perhaps, that any problems affecting objectivity will be solved by increased subjectivity in the form of personalization of results achieved by personalized search algorithms. However, this direction in search engine evolution can have serious negative effects for democracy and democratic ideals (discussed further in Section 3.4).
Concerns affecting objectivity and bias in the context of search engines are also closely related to controversies pertaining to the lack of “openness” or “transparency.” Some critics point out that search engine companies are not fully open, or transparent, both with respect to why they (a) include some sites and not others (in their lists of results for users' queries), and (b) rank some pages in their list of search results higher than others. These kinds of opacity/non-transparency-related concerns tend to fall under a description that Hinman (2005) calls “the problem of algorithm.” Hinman notes that the algorithms that govern searches are well-kept secrets, which he also believes is appropriate. Because Google's PageRank algorithm is “a patented and closely guarded piece of intellectual property”—comprising “500 million variables and 2 billion terms” (Halpern 2011, 4)—we don't know the algorithm's formulas. And this factor, of course, makes it difficult to comment on an algorithm's objectivity or lack thereof.
Another set of worries affecting opacity/non-transparency arise because search engine companies do not always disclose, either fully or clearly, their practices with respect to two important points: (i) whether (and to what extent) they collect information about users; and (ii) what they do with that information once it has been collected. These kinds of opacity/non-transparency concerns involving search engine companies, which are also related to privacy issues affecting monitoring and surveillance, are examined in detail in Section 3.3.
At least two distinct kinds of privacy concerns arise in the context of search engines. One set of privacy issues emerge because search engine companies can collect personal information about search engine users; in this scheme, the users are, in effect, “data subjects” for the search engine companies and their advertisers. However, search engine users themselves—whether acting on their own behalf or on behalf of organizations that hire them—can use the technology to conduct online searches about people. In this case, the targeted people (some of whom may never have used or possibly have never even heard of a search engine) are the subjects of search engine users. In both cases, privacy concerns arise in connection with questions about fairness for the data subjects involved. Consider that many of those who become the subjects of the search queries have not explicitly consented either to having certain kinds of personal information about them collected or having personal information about them (that has been collected in some other context) also being made available on the Web, or both.
In this section, we examine search-engine-related privacy concerns affecting people who have become the subjects, or “targets,” of queries by search engine users. This kind of privacy concern is exacerbated by the ever expanding amount of personal information about ordinary people that is currently discoverable by search engines and thus accessible to Internet users. But why, exactly, is this problematic from the vantage point of privacy? For one thing, it is not clear that most people have voluntarily consented to having information about them placed in databases or in online forums that are accessible to search engines (and thus potentially available to any Internet user). And we noted that search engine users, whether they are acting simply on their own, or as representatives of business and corporations, can and often do access a wealth of information about many of us via search engines. Privacy advocates question whether this practice is fair, especially to people who have not explicitly consented to having personal information about them included in the online forums and databases that are now so easily searchable because of sophisticated search engine technology.
Privacy concerns affecting practices in which people are the subjects of search queries can be further distinguished in terms of two separate categories, i.e., where search engines are used to: (i) track the location of individuals, sometimes for the purpose of harassing or stalking them; and (ii) acquire personal information about people. We briefly examine each practice.
Regarding (i), one might ask why using search engines to track and locate persons is controversial from a privacy perspective. First, consider that some organizations have developed specialized search engines for questionable purposes, such as stalking people. For example, one search facility (Gawker-Stalker) has been designed specifically for the purpose of stalking famous people, including celebrities. Imagine a case in which a celebrity has been spotted while dining at an up-scale restaurant in San Francisco. The person who spots the celebrity can send a “tip” via e-mail to Gawker-Stalker, informing the site's users of her whereabouts. The Gawker site then provides its users, via precise GPS software, with information about exactly where, and at what time, she was last seen. Users interested in stalking that celebrity can then follow her movements electronically, via the Gawker site, or they can locate and follow her in physical space, if they happen to be in the same geographical vicinity as the celebrity at that time (Tavani 2011).
Second, we note that it is not only celebrities and “high-profile” public figures that are vulnerable to being stalked, as well as to having personal information about them accessed, via search engines. Consider the case of Amy Boyer, a twenty-year old resident of New Hampshire, who was stalked online by a former “admirer” named Liam Youens and who was eventually murdered by him. Using standard Internet search facilities, Youens was able to get all of the information about Boyer that he needed to stalk her—i.e., information about where she lived, worked, and so forth (see, for example, Tavani and Gridzinsky, 2002). Incidents such as the Boyer case invite us to question current policies—or, in lieu of clear and explicit policies, our default positions and assumptions—with regard to the amount and the kind of personal information about ordinary persons that is currently accessible to search engine users. It now appears likely that Amy Boyer had no idea that so much personal information about her was so easily accessible online via search engines.
We next consider (ii), the use of search engines to find information about people—not about their location, but about their activities, interests, and backgrounds. As in the Amy Boyer case, these search-engine-related privacy issues also arise when ordinary people become the subjects of search queries. Consider that, increasingly, employers use online search techniques to acquire information about prospective and current employees. It is well known that many employers try to access information on the Facebook accounts of job applicants they are considering. This kind of information has, in certain instances, been used by employers in determining whether or not to hire particular applicants (and possibly also used in deciding whether or not to promote current employees). So, for example, a college student who posts on Facebook one or more pictures of himself drinking alcoholic beverages, or perhaps behaving wildly at a party, can potentially jeopardize his future employment opportunity with a company that might otherwise hire him upon graduating from college. In defense of this kind of “screening” practice used by companies in hiring employees, one could argue that the company has merely elected to use currently available tools to search for information about persons who voluntarily posted material (e.g., in the form of photos, etc.) about themselves on Facebook.
Our primary concern here is with personal information that has not been voluntarily disclosed by persons, but is nonetheless accessible online via search engines. This kind of concern involving access to personal information in online forums is by no means new or recent. Consider that in the decade preceding Facebook, employers had been able to access information about job applicants via online search tools—e.g., they could (and did) use search engines to accomplish this task, simply by entering the name of the individual in a search engine box. Imagine a hypothetical scenario in which a person, Lee, applies for a full-time employment position at Corporation X. Also, imagine that someone on the corporation's search committee for this position decides to conduct an online search about Lee, shortly after receiving her application. Further imagine that in response to the query about Lee, three results are returned by the search engine. One result includes a link to a gay/lesbian organization in which Lee is identified as someone who contributed to a recent event hosted by that organization. Next, imagine that Lee is turned down for the job at Corporation X. Further imagine that Lee becomes curious as to why she might not have been selected for that job and she decides to do an Internet search of her name for the first time. Lee then discovers the search result linking her to the gay/lesbian organization (Tavani 1998). Should Lee infer that she was denied the job because of her apparent association with this organization? Is that a reasonable inference? Maybe not! Nevertheless, an important question arises: Is it fair that someone has posted this information about Lee online, without her consent, to a source that is accessible to one or more search engines? Is that information about Lee now “fair game,” and should it be viewed simply as information that is “up for grabs” (Nissenbaum 2004) and thus appropriate for use by prospective employers?
How is the scenario involving Lee different from a case in which an employer uses information on a Facebook account to screen job applicants? For one thing, Facebook users typically post information about themselves that can be seen by others and thus have voluntarily consented to have that information available for others to access (assuming that they have not specified their Facebook privacy settings). Furthermore, they are also aware that such information about them exists on that online forum. But what about job applicants who do not have accounts on Facebook, or on any other social networking site? Are they less vulnerable to online scrutiny by potential employers? Many people may have no idea about either the kind or amount of online personal information about them that is accessible to an employer or to anyone using a search engine.
Next consider a hypothetical scenario similar to Lee's, where Phil, who recently earned a Ph.D., is applying for a faculty position at University X. But suppose that a few disgruntled former students have posted some highly critical and negative remarks about Phil on RateMyProfessor.com. Next, suppose that a member of the faculty search committee at University X conducts an online search on Phil and discovers the disparaging remarks made by the students. Finally, Phil is informed that he has not been selected for the faculty position. Shortly after receiving his letter of rejection, Phil happens to discover the comments about him made by the disgruntled students, by conducting an online search of his name. Would it be unreasonable for Phil to infer that the remarks by these students on RateMyProfessor.com influenced the hiring committee's decision not to select him?
In one sense, Phil's predicament is very similar to Lee's—viz., neither job applicant had any kind of control over what people other than themselves had posted about them in online forums accessible to search engines. However, the negative information posted about Phil was directly related to the kind of criteria that typically would be used in considering an applicant for a faculty position. The information posted about Lee, while not directly related to the job for which she was applying, could nonetheless also harm her chances of gaining that position. In neither case, however, did Lee or Phil have any say about the kind of information about them, or about the accuracy of that information, that could be so easily retrieved online and used by a prospective employer in making a decision about whether or not to hire them.
On the one hand, we can ask what kind of recourse people like Phil and Lee could expect to have in situations such as this—e.g., can they reasonably expect to have control over any kind of information about them that is currently accessible to search engines? But, on the other hand, it may not seem totally unreasonable for them to have some expectation of limited control over their personal information, even if only to be able to challenge the legitimacy of inaccurate information, especially when they had not consented to having it included in online forums and databases accessible to search engines. There is also an aspect of this kind of personal information that overlaps with “public” information. So, perhaps the tension that arises in these scenarios can be viewed as a contemporary variation of the age-old debate about the private vs. public nature of personal information. This tension is further complicated by the fact that most people, as the subjects of online searches, enjoy no normative protection regarding personal information about them that is now available online—mainly because of the presumed “public nature” of this personal information involved (Tavani 2005).
Some forms of personal information enjoy normative protection via specific privacy policies and laws, because they qualify as information about persons that is considered either sensitive or intimate, or both. We can refer to this kind of personal information as Non-Public Personal Information (or NPI). However, many privacy analysts now worry about the ways in which a different kind of personal information—Public Personal Information (or PPI), which is non-confidential and non-intimate in character—is easily collected and exchanged over the Internet. How can PPI and NPI be distinguished? NPI, which as noted above, is viewed as information about persons that is essentially confidential or sensitive in nature, includes information about a person's finances and medical history. PPI, although also understood as information that is personal in nature, is different from NPI in at least one important respect: it is neither sensitive nor confidential. For example, information about where an individual works or attends school, as well as what kind of automobile he or she owns, can be considered personal information in the sense that it is information about some individual as a particular person. However, this kind of personal information typically does not enjoy the same kinds of privacy protection that has been granted to NPI (Tavani 2011).
Initially, concerns about personal information that can be gathered and exchanged electronically focused mainly on NPI. In response to these concerns, some specific privacy laws and policies were established to protect NPI. But many privacy advocates now also worry about the ways in which PPI is routinely collected and analyzed via digital technologies. They have argued that PPI deserves greater legal and normative protection than it currently has. Nissenbaum (1997, 1998) has referred to the challenge that we face with regard to protecting (the kind of information that we refer to as) PPI as the “problem of protecting privacy in public.” Some privacy advocates argue that our earlier assumptions about what kinds of publicly available information about us need legal (or other kinds of “normative”) protection are no longer adequate because of the way much of that information can now be processed via digital technologies, especially in the commercial sphere. For example, seemingly innocuous information about persons, based on their activities in the public sphere, can be “mined” to create user profiles based on implicit patterns in the data and those profiles (whether accurate or not) can be used to make important decisions affecting people.
We next examine privacy concerns in which search engine users themselves are the data subjects (i.e., for search engine companies). Zimmer (2008, 83) notes that personal information about users is “routinely collected” when they use search engines for their “information-seeking activities.” But why, exactly, is this problematic from the perspective of privacy? For one thing, search engine companies such as Google create a record of every search made by users, and these records are also archived. The topic searched for, as well as the date and time the specific search request is made by a user, are included in the record. Until recently, many people had been unaware that their search queries were being recorded and tracked.
In January 2006, many Google users learned that the search engine company had kept a log of all of their previous searches. At least four major search engine companies had been subpoenaed by the Bush Administration in 2005 for search records based on one week's of searches during the summer of 2005. They were Google, Yahoo, AOL, and Microsoft (MSN). Google refused to turn over the information. The other search engine companies would not say how they responded; many assumed, however, that those companies complied with the government's subpoena. Google was requested to turn over two items: (1) the results of search queries/requests it received during a one-week period, and (2) a random list of approximately one million URLs searched. Google argued that turning over this information would: (a) violate the privacy of its users (and undermine their trust in the search engine company), and (b) reveal information about the (proprietary) algorithm and the processes that Google uses and that this would potentially harm its competitive edge as a search service. A court ruled that Google did not have to comply with (1), but it reached a compromise regarding (2), ruling that Google turn over 50,000 URLs to the government (Nissenbaum 2010, 29–30).
The information collected about a user's search queries might seem relatively innocuous—after all, who would be interested in knowing about the kinds of searches we conduct on the Internet, and who would want to use this information against us? On the other hand, however, seemingly innocuous personal information can be mined by information merchants and used to construct personal profiles about us, and that these profiles, in turn, can be based on information that is not accurate and can be used to make decisions about us that are not fair. For example, imagine a case in which a student happens to be writing a paper on Internet pornography and uses a search engine to acquire some references for her research. Records of this user's search requests could reveal several queries that individual made about pornographic Web sites, which in turn might suggest that this user is interested in viewing pornography. So individual searches made by a particular user could theoretically be analyzed in ways to construct a profile of that user that is inaccurate. And, records of the searches made by this and other users could later be subpoenaed in court cases (Tavani 2011).
As already noted, information about a user's search queries is collected by search engine companies as well as by many different kinds of “information merchants” in the commercial sphere. Halpern (2011, 8) notes that there are approximately 500 companies that are able to track all of our online movements, thereby “mining the raw material of the Web and selling it to…data mining companies.” Pariser (2011) points out that in tracking our activities, Google's uses fifty-seven signals—
everything from where you were logged in, from what browser you were using, to what you had searched for before to make queries about, to who you were and what kinds of sites you'd like.
And Zimmer (2008, 77) notes that Google integrates information gathered from
Web cookies, detailed server logs, and user accounts… [from Google applications such as Gmail, Google +, and Google Chrome]… which provides a powerful infrastructure of dataveillance to monitor, record, and aggregate users' online activities.
Furthermore, Pariser notes that Google and other major search engine companies use “prediction engines” to construct and refine theories about who we are (and what we want to do next). He also notes that many information merchants regard every “click signal” a user creates as a “commodity” that can be “auctioned off within microseconds to the highest bidding consumer.” Pariser points out that one information merchant, a company called Acxiom, has
accumulated an average of 1500 pieces of data on every person in its database—personal data that ranges from credit scores to medications used.
Of course, some users might respond that they do not feel threatened by this practice; for example, they might be inclined to feel safe from a loss of personal privacy because they assume that the data collected about them is anonymous in the sense that it is identifiable only as an IP address, as opposed to a person's name. However, Zimmer (2008, 83) notes that in 2005, one AOL user was able to be identified by name “because the Web searches she performed on various topics were recorded and later released by AOL.” It turns out that AOL Research had released over three months worth of personal search data involving 650,000 users (Wall 2011, 18). Nissenbaum (2010, 30) points out that in this case, AOL used a process in which
certain identities could be extracted from massive records of anonymized search-query data that AOL regularly posted on the Internet for use by the scientific research community.
Zimmer believes that the incident involving AOL is not unique, but is instead one more use of data surveillance or “dataveillance” (a term coined by Roger Clarke in 1988)—i.e., one applied in the context of search queries.
It is also important to consider whether a meaningful distinction can be drawn between monitoring and surveillance in this context. Noting that the two terms are often used interchangeably, Nissenbaum (2010, 22) differentiates between them in the following way. Whereas surveillance is a “form of monitoring ‘from above’ by political regimes and those in authority,” monitoring is used in broader social and “socio-technical” contexts. In Nissenbaum's scheme, both monitoring and surveillance are examples of what she calls “socio-technical contexts,” but they are usually put to different uses. For example, Nissenbaum points out that monitoring can be done by systems “whose explicit purpose is to monitor” (e.g., CCTVs). But she also notes that information itself can constitute a “modality of monitoring.” For example, she points out that Clarke's notion of “dataveillance” includes monitoring practices that involve both interactions and transactions. However, we can generally regard the kinds of practices carried out by information merchants in the consumer sphere as instances of monitoring (rather than surveillance) in Nissenbaum's sense of that term.
We next shift our focus away from privacy concerns about monitoring in the commercial sector to worries about surveillance by government actors with respect to information acquired as a result of users' search queries. Earlier in this section we noted that in 2005, the Bush Administration informed Google that it must turn over a list of all users' queries entered into its search engine during a one week period (the exact dates were not specified by Google). The Bush Administration's decision to seek information about the search requests of ordinary users triggered significant criticism from many privacy advocates. Although the Bush Administration claimed that it had the authority to seek electronic information in order to fight the “war on terror” and to prevent another September 11-like attack, some critics worried that the government was trying to use the subpoenaed information, not for national defense or anti-terrorism purposes, but rather to gain data to support its stance on the Child Online Protection Act, which had been challenged in a U.S. District Court and was being revisited by Congress (Nissenbaum 2010, 29). These critics also worried about the implications this has for privacy (as an important human value) in the ongoing tension involving security vs. privacy interests. And even if privacy is not an absolute value but is sometimes outweighed by security concerns, as Himma (2007) argues, some critics question the rationale used for obtaining records of search requests made by ordinary citizens.
Hinman (2005) notes that the Patriot Act, passed in the aftermath of 9/11, allowed U.S. government officials to get information from libraries about which books members had borrowed. He then shows how the reasoning used in the case of libraries could easily be extended to search engines—for example,
if the government could see which books someone was taking out of a library, why couldn't it also see which searches we made on search engines?
Hinman also points out that there are several other ways in which a user's search requests can be disclosed because of practices used by major search engine companies such as Google. He also worries that such practices could eventually lead to surveillance and to suppressing political dissent (as is it has in China). Hinman questions whether Google might have been under political pressure from outside interests (e.g., the Bush Administration) to take down photographs of Abu Ghraib, which were posted but then soon removed with no apparent explanation by that search engine company.
In this section, we consider some implications that the surveillance of users' queries by search-engine companies can have for a free and open society. In the early days of the Internet, many people assumed that search engine technology favored democracy and democratic ideals. For example, Introna and Nissenbaum (2000, 169) note that search engines were viewed as a technology that would
…give voice to diverse social, economic, and cultural groups, to members of society not frequently heard in the public sphere [and] empower the traditionally disempowered, giving them access both to typically unreachable modes of power and to previously unavailable troves of information.
However, Introna and Nissenbaum also describe what can be viewed as an “anti-democratic” aspect of contemporary search technology when they note that search engines “systematically exclude” certain Web sites, as well as “certain types of sites, ” over others. And Diaz (2008, 11) echoes this concern when he notes that that major search engine companies such as Google direct “hundreds of millions of users towards some content and not others, towards some sources and not others.” So, following Diaz (p. 15), we can ask whether the kinds of “independent voices and diverse viewpoints” that are essential for a democracy are capable of being “heard through the filter of search engines.”
Search engines have often been described as the “gatekeepers of cyberspace,” and some critics note that this has significant implications for democracy. For example, Diaz (2008, 11) points out that
if we believe in the principles of deliberative democracy—and especially if we believe that that the Web is an open ‘democratic’ medium—then we should expect our search engines to disseminate a broad spectrum of information on any given topic.
Hinman (2005, 25) makes a similar point, when he notes that “the flourishing of deliberative democracy is dependent on the free and undistorted access to information.” And because search engines are “increasingly the principal gatekeepers of knowledge,” Hinman argues that “we find ourselves moving in a philosophically dangerous position.” (We briefly return to Hinman's point in the concluding section of this entry.)
Morozov (2011) also describes some concerns for democracy vis-à-vis contemporary search engines by calling attention to the filtering of information that search engines make possible. For one thing, he agrees with Sunstein (2001) who worries that the kind of selectivity made possible by Internet filtering can easily trap us inside our “information cocoons.” And Lessig (2000) suggests that any kind of filtering on the Internet is equivalent to censorship because it blocks out some forms of expression. Morozov points out that whereas Sunstein worries that people could use Internet technology to “overly customize what they read,” the reality is that contemporary search engine companies have already silently done this for them. Morozov's concerns about what search engine companies are now doing through filtering and customization schemes, and why this is problematic for a democracy, are echoed by Pariser (2011, 13) who points out that “personalization filters serve up a kind of invisible autopropaganda, indoctrinating us with our own ideas, amplifying our desire for things that are familiar and leaving us oblivious to the dangers lurking in the dark territory of the unknown”.
Pariser notes that while democracy “requires citizens to see things from one another's point of view,” we are instead increasingly “more enclosed in our own bubbles.” He goes on to note that democracy also “requires a reliance on shared facts,” but instead we are being presented with “parallel but separate universes.” To illustrate how this trend away from citizens having shared facts can be so dangerous for a democracy, Parsier uses the example of the debate about climate change in the U.S during the past decade. He points out that studies have shown that between 2001 and 2010 many people's beliefs about whether the climate was warming shifted significantly, based on one's affiliation with a major political party. Pariser notes that a Web search for “climate change” will yield very different results for a person whom the search algorithm determines to be a Democrat than for someone it determines to be a Republican. He also notes that the search algorithm will generate different results for someone it determines to be an oil company executive vs. an environmental activist.
Along lines similar to Pariser's, Halpern (2011, 5–6) notes that search engines like Google direct us to material that is most likely to reinforce our own “worldview, ideology, and assumptions” and thus “cut us off from dissenting opinion and conflicting points of view.” Pariser points out that with Google, a user now gets the results that the search engine company believes are best for that particular user. He describes a case where two people entered in the keyword “BP” (for British Petroleum) during the time period of the accident involving the Deep Water Horizon oil rig in the Gulf of Mexico. In response to one user's query, investment information about BP was returned as the lead result, while the other user received information about the oil spill.
Not only do some practices by search engine companies pose a threat for democracy and democratic ideals, other practices (in which search engine companies are arguably complicit) reinforce censorship schemes currently used by non-democratic nations. For example, it is well know that China has succeeded in blocking access to political sites that it regards as threatening. Consider that it blocks ordinary Chinese users from access to sites such as “Tiananmen Square,” “Free Tibet,” and “Dalai Lama.” Critics note that Google agreed to comply with China's censorship laws when the search engine company entered the Chinese market in 2006. Spinello (2012) believes that this agreement violated Google's “don't be evil” principle—a core principle of the search engine company—because Google “facilitated and supported” China's censorship regime. But some of Google's defenders have argued that the search engine company's entry into China made it possible for China's residents to have greater access to information, overall. Other defenders of Google point out that the search engine giant did not act alone because major U.S. companies, such as Yahoo and MSN, also complied with China's censorship laws. And Hinman notes that the Chinese government also received cooperation from other American companies, such as Cisco Systems, in establishing the infrastructure or backbone components of its Internet firewall.
In 2010, Google changed its policy for operating in China and it now directs its Google.cn users to a site in Hong Kong that is uncensored. However, Hinman believes that we should still worry about Google's willingness to comply with the Chinese government's strict censorship laws when initially setting up its business operations in China. He reasons that if this search engine giant could be so easily influenced by a government that has a relatively low economic impact on its business overall, it could be much more influenced by the U.S. Government, where the political and economic impact would be far more significant. For example, in 2010, it was estimated that Google searches generated approximately $55 billion in economic activity in the U.S. (alone); some worry that this factor has also given Google considerable power over companies that rely on it for their Internet traffic (Spinello 2012). They also worry that Google, in an effort to retain that economic power in the U.S. in the future, could conceivably cave into pressure to comply with government policies (in the U.S. and possibly other democratic nations as well) that might support censorship at some level.
While many initially believed that that the Internet would promote democracy by weeding out totalitarian societies, because they are “inefficient and wasteful” (Chorost 2011), Berners-Lee (2010) believes that the Web “we have come to know” is now threatened because both totalitarian and democratic governments alike are “monitoring people's online habits, endangering important human rights.” Perhaps Hinman (2005, 25) best sums up this worry, when he remarks,
We risk having our access to information controlled by ever-powerful, increasingly opaque, and almost completely unregulated search engines that could shape and distort our future largely without our knowledge. For the sake of a free society, we must pursue the development of structures of accountability for search engines.
If Berners-Lee, Hinman, and others are correct, it would seem that we have much to worry about indeed as we go forward trying to preserve our basic freedoms in a democracy, while at the same time taking advantage of many of the personalizing- and customizing-based features that search engines such as Google now offer us. Although many would agree with Hinman that search engine companies need to be held accountable in some sense and that increased transparency would be an important “first step,” few—thus far, at least—have endorsed the notion of formal government regulation.
The ethical issues examined in this entry are mainly from a deontological perspective, thus reflecting the published work on this topic to date. However, in no way is our analysis of ethical issues affecting search engines intended to be exhaustive; rather it merely reflects the standard or “mainstream” approach that applied ethicists and other scholars have taken thus far in their analyses of search engine controversies. One could easily imagine questions arising from other ethical perspectives as well. From the vantage point of social justice, for example, one could reasonably ask whether search engine companies in the U.S. and other developed nations are morally obligated to help bridge the “information divide” by providing easier and more ubiquitous access (via search technologies) to users in developing nations, especially to non-English speaking users. Also, from a utilitarian or consequentialist perspective, one might ask whether the overall social consequences would be more beneficial if search engine users were legally permitted to retrieve some forms of proprietary electronic media for personal use.
Thus far, issues from the perspectives of business ethics and professional responsibility for search engine companies have not been directly addressed in this entry. Consider, for example, that one might reasonably ask whether these companies have any special moral obligations because of their “privileged place” in our society? We saw that a number of authors have referred to search engines as the “gatekeepers of the Web,” and this factor in itself may entail some special responsibilities. Hinman (2005, 21) lists four reasons why search engine companies should shoulder significant social responsibility. First, he notes that search engines “play an absolutely crucial role in the access to information” and that without them, the Web would “simply be inaccessible to us” and thus “almost useless.” Second, he points out that “access to information is crucial for responsible citizenship,” also noting that “citizens in a democracy cannot make informed decisions without access to accurate information.” Third, Hinman notes that search engines have become “central to education,” and he points out that students now search on Google and other major search engines more frequently than they visit libraries. Fourth, he points out that major search engines are owned by private corporations—i.e., by “businesses that are quite properly seeking to make a profit.” Hence, it would seem that conflicts can easily arise between corporate profits (on the part of search engine companies) and the interests of the general public good. For example, Nicas (2011, 1) points out that while many search engine companies were initially “content to simply produce search results,” some now are becoming increasingly involved in the nuts and bolts of diverse markets, expanding their offerings to include “everything from online music to local coupons to mobile phones.” Nicas also describes a bias-related concern affecting this trend by noting that when Google recently entered the online travel business, it “began placing its new flight-search service atop general search results”—i.e., above those of other major players in the online travel business such as Orbitz and Expedia.
So it would seem that there are good reasons to be concerned about conflicts of interest involving search engine companies and their role as gatekeepers. Carr (2011, 163) worries that the commercial control that major search engine companies now have over the “distribution of digital information” could ultimately lead to “restrictions on the flow of knowledge.” And Hinman (2008, 67) believes that the control of knowledge that these companies have is “in a very fundamental sense, a public trust, yet it remains firmly ensconced in private hands and behind a veil of corporate secrecy.” Much of this secrecy, as we have seen, is closely tied to the proprietary search algorithms that major search engines use, which also raises the question of whether aspects of these algorithms should be more transparent to the general public. However, Elgesem (2008, 241) argues that search engine companies should not be required to disclose information about their proprietary search algorithms (even though they should be required to make their policies known to users and to follow those policies, because of the important role they have as “contributors to the public use of reason”). The above information is only a brief overview of the controversies affecting moral-responsibility for search engine companies.
In this entry, we have seen how various ethical issues arose during key developmental stages of search technology that eventually led to contemporary “Web 2.0-era search engines.” Search technology itself has evolved over the past 50-plus years—i.e., from pre-computer-based techniques (such as memex) intended to help scientists and professional researchers locate and retrieve important information in a timely manner, to an Internet-based technology that assisted ordinary users in locating (and linking directly to) relevant Web sites in response to their manifold search queries, to a highly sophisticated technology that, in its current form, has become increasingly commodified and personalized to the point that it might now be regarded as a threat to some of our basic freedoms and democratic ideals.
Even though the primary focus has been on identifying and analyzing ethical issues affecting search engines, in closing it is worth reiterating a point made in the introductory section—viz., that search engine technology also has implications for some (non-ethical) philosophical issues. This is especially apparent in the area of epistemology, where some critics raise concerns related to our received notions of the nature and justification of knowledge claims, in an era of widespread use and dependence on search engines. For example, Hinman (2008, 75) argues that search engines “contribute significantly to the social construction of knowledge”—i.e., not only do they provide access to knowledge, but they also increasingly “play a crucial role in the constitution of knowledge itself.” He also claims that search engines “have replaced scientific and scholarly legitimation with a digital view of the vox populi” and are increasingly “providing a new Rangordnung of knowledge claims that replace traditional legitimation structures” (Hinman 2008, 67). This concern, however, as well as other epistemological and (broader) philosophical issues, is beyond the scope of the present entry.
- Abbate, J., 1999. Inventing the Internet. Cambridge, MA: MIT Press.
- Berners-Lee, T., 2010. “Long Live the Web: A Call for Continued Open Standards and Neutrality,” Scientific American, November. [Berners-Lee 2010 available online].
- Blanke, T., 2005. “Ethical Subjectification and Search Engines: Ethics Reconsidered,” International Review of Information Ethics, 3: 33–38.
- Brey, P., 1998. “The Politics of Computer Systems and the Ethics of Design.” In Computer Ethics: Philosophical Enquiry. Ed. M. J. van den Hoven, Rotterdam, The Netherlands: Erasmus University Press, pp. 64–75.
- –––, 2004. “Disclosive Computer Ethics.” In Readings in CyberEthics. (2nd ed.) Eds. R. A. Spinello and H. T. Tavani, Sudbury, MA: Jones and Bartlett, pp. 55–66.
- Brin, S. and Page, L., 1998. “The Anatomy of a Large-Scale Hypertextual Web Search Engine.” In Seventh International World-Wide Web Conference (WWW 7), Amsterdam: Elsevier.
- Bush, V., 1945. “As We May Think,” Atlantic Monthly, July. [Bush 1945 available online].
- Carr, N., 2011. The Shallows: What the Internet is Doing to Our Brain. New York: Norton.
- Chorost, M., 2011. World Wide Mind: The Coming Integration of Humanity, Machines, and the Internet. New York: Free Press.
- Diaz, A., 2008. “Through the Google Goggles: Sociopolitical Bias in Search Engine Design.” In Web Search: Multidisciplinary Perspectives. Eds. A. Spink and M. Zimmer, Berlin: Springer-Verlag, pp. 11–34.
- Elgesem, D., 2008. “Search Engines and the Public Use of Reason.” Ethics and Information Technology, 10(4): 233–242.
- Friedman, B., P. Kahn, and A. Borning, 2008. “Value Sensitive Design and Information Systems.” In The Handbook of Information and Computer Ethics. Eds. K. E. Himma and H. T. Tavani, Hoboken, NJ: John Wiley and Sons, pp. 69–101.
- Friedman, B. and H. Nissenbaum, 1996 “Bias in Computer Systems,” ACM Transactions on Computer Systems, 14(3): 330–347.
- Goldman, E., 2008. “Search Engine Bias and the Demise of Search Engine Utopianism.” In Web Search: Multidisciplinary Perspectives. Eds. A. Spink and M. Zimmer, Berlin: Springer-Verlag, pp. 121–134.
- Halavais, A., 2009. Search Engine Society. Malden, MA: Polity.
- Halpern, S., 2011. “Mind Control and the Internet,” New York Review of Books, June 23. [Halpern 2011 available online]
- Himma, K. E., 2007. “Privacy vs. Security: Why Privacy is Not an Absolute Value or Right,” University of San Diego Law Review (Fourth Annual Editors' Symposium), 45: 857–921.
- Hinman, L. M., 2005. “Esse Est Indicato in Google: Ethical and Political Issues in Search Engines,” International Review of Information Ethics, 3: 19–25.
- –––, 2008. “Searching Ethics: The Role of Search Engines in the Construction and Distribution of Knowledge.” In Web Search: Multidisciplinary Perspectives. Eds. A. Spink and M. Zimmer, Berlin: Springer-Verlag, pp. 67–76.
- Introna, L. and H. Nissenbaum, 2000. “Shaping the Web: Why The Politics of Search Engines Matters,” The Information Society, 16(3): 169–185.
- Lessig, L., 2000. Code and Other Laws of Cyberspace. New York: Basic Books.
- Levy, D. M., 2008. “Information Overload.” In The Handbook of Information and Computer Ethics. Eds. K. E. Himma and H. T. Tavani, Hoboken, NJ: John Wiley and Sons, pp. 497–515.
- Moor, J. H., 1997. “Towards a Theory of Privacy in the Information Age,” Computers and Society, 27(3): 27–32.
- Morozov, E., 2011. “Your Own Facts,” New York Times Sunday Book Review, June 10. [Morozov 2011 available online]
- Nagenborg, M. (ed.), 2005. The Ethics of Search Engines. Special Issue of International Review of Information Ethics. Vol. 3.
- Nicas, J., 2011. “Google Roils Travel,” Wall Street Journal, 12/27. [Nicas 2011 available online].
- Nissenbaum, H., 1997. “Toward an Approach to Privacy in Public: Challenges of Information Technology,” Ethics and Behavior, 7(3): 207–219.
- –––, 1998. “Protecting Privacy in an Information Age,” Law and Philosophy, 17: 559–596.
- –––, 2004. “Privacy as Contextual Integrity,” Washington Law Review, 79(1): 119–157.
- –––, 2010. Privacy in Context: Technology, Policy, and the Integrity of Social Life. Palo Alto, CA: Stanford University Press.
- O'Reilly, T., 2005. “What is Web 2.0? Design Patterns and Business Models for the Next Generation of Software,” O'Reilly Media. [O'Reilly 2005 available online].
- Pariser, E., 2011. The Filter Bubble: What the Internet is Hiding from You. New York: Penguin.
- Spinello, R. A., 2011. CyberEthics: Morality and Law in Cyberspace. 4th ed. Sudbury, MA: Jones and Bartlett.
- –––, 2012. “Google in China: Corporate Responsibility on a Censored Internet.” In Investigating Cyber Law and Cyber Ethics: Issues, Impacts, Practices. Eds. A. Dudley, J. Braman, and G. Vincenti, Hershey, PA: IGI Global, pp. 239–253.
- Sunstein, C., 2001. Republic.com. Princeton, NJ: Princeton University Press.
- Tavani, H. T., 1998. “Internet Search Engines and Personal Privacy.” In Computer Ethics: Philosophical Enquiry. Ed. M. J. van den Hoven, Rotterdam, The Netherlands: Erasmus University Press, pp. 214–223.
- –––, 2005. “Search Engines, Personal Information, and the Problem of Protecting Privacy in Public,” International Review of Information Ethics, 3: 39–45.
- –––, 2011. Ethics and Technology: Controversies, Questions, and Strategies for Ethical Computing. 3rd ed. Hoboken, NJ: John Wiley and Sons.
- Tavani, H. T. and F. S. Grodzinsky, 2002. “Cyberstalking, Personal Privacy, and Moral Responsibility,” Ethics and Information Technology, 4(2): 123–132.
- Tavani, H. T. and J. H. Moor, 2001. “Privacy Protection, Control of Information, and Privacy-Enhancing Technologies,” Computers and Society, 31(1): 6–11.
- Van Couvering, E., 2008. “The History of Internet Search Engines: Navigational Media.” In Web Search: Multidisciplinary Perspectives. Eds. A. Spink and M. Zimmer, Berlin: Springer-Verlag, pp. 177–206.
- Wall, A., 2011. “History of Search Engines: From 1945 to Google Today,” Atlantic Online. [Wall 2011 available online].
- Zimmer, M., 2008. “The Gaze of the Perfect Search Engine: Google as an Institution of Dataveillance.” In Web Search: Multidisciplinary Perspectives. Eds. A. Spink and M. Zimmer, Berlin: Springer-Verlag, pp. 77–99.
How to cite this entry. Preview the PDF version of this entry at the Friends of the SEP Society. Look up this entry topic at the Indiana Philosophy Ontology Project (InPhO). Enhanced bibliography for this entry at PhilPapers, with links to its database.
- The Ethics and Politics of Search Engines, panel discussion, co-sponsored by Santa Clara University Markkula Center for Applied Ethics and the Santa Clara University Center for Science, Technology, and Society, on February 27, 2006.
I am grateful to the following colleagues for their helpful suggestions on earlier drafts of this entry: Jeff Buechner, Lloyd Carr, Jerry Dolan, Fran Grodzinsky, Ken Himma, Larry Hinman, Martin Menke, and Richard Spinello. I would also like to thank the anonymous SEP reviewers for their insightful comments and helpful suggestions.