Our Siri, Ourselves
Yesterday I started seeing some surprising headlines about Siri, the new personal assistant search bot embedded in the new iPhone. Things like "Apparently, Siri systematically excludes abortion clinics from searches," and "Siri can't direct you to an abortion clinic," and "Apple iPhone's Siri Doesn't Locate Abortion Clinics" (since changed).
It turns out this isn't true. What is true is that Siri returns wildly inconsistent and unreliable answers to natural-language questions about things relating to female reproductive health, while seemingly being much more reliable about dude reproductive stuff like condoms and Viagra. I'm a bit hesitant to write about it since I don't have the new iPhone, and thus can't systematically try to find the fault lines myself. Nonetheless, the alarming notices turned a lot of Siri owners into a massive public beta-testing team—which is fantastic—and a lot of good data has emerged. It challenges the idea that Siri or Apple itself alone is "sexist," but reveals a lot about how these things are embedded in both computers and culture. Or more accurately, in computers by culture.
A good place to start is a lengthy post by Danny Sullivan at SearchEngineLand. As Sullivan points out, some people asked Siri where they could get an abortion in major U.S. cities, and were given either nothing or references to crisis pregnancy centers. But other people, in other places, at other times, were given locations for Planned Parenthood or other women's health centers. And if you just tell Siri to search Google for you instead of asking it a natural-language question, it will glumly return a direct Web search.
So right away we know that Siri doesn't "systematically exclude abortion clinics"; it excludes them unsystematically. But computers are systems, so that's not exactly true either. Which leads us into the deep dark hole of how Siri works so far as people have been able to perceive through what we know about it. And the deep dark hole of how people work, which is more complex and less pleasant.
If you want the ground-level view, there's an excellent pre-controversy article in New Scientist:
Apple won't talk about Siri's underlying technology, though a patent application it filed earlier this year reveals that the software manages these errors by restricting queries to specific areas like dining or the weather. Apple calls such themes, for which Siri has access to databases of information, "active ontologies". For example, the dining ontology contains databases of restaurants, cuisines and dishes, along with information on the concept of a meal - that it involves one or more people gathering to eat.
The active ontology idea is not new - Tom Gruber, one of the inventors of Siri, formally defined it in 1995. What is unusual about Siri is that, unlike earlier grand AI projects, it is "very specifically focused on helping in particular domains", says Philip Resnik, a computational linguist at the University of Maryland in College Park. "If you go out of those domains, all bets are off."
Something called "active ontologies" would later run into problems with gender? Who could have predicted? Besides lit majors, I mean. Unwired Review takes a look at the patent and goes deeper into active ontologies:
E.g. Restauarant/dining active Ontology can include one or several restaurant databases, a number of restaurant review services like Yelp and Zagat, accessed via API, a special dining related vocabulary database, a model of actions that people usually perform when they they decide on the next dinner, an access to reservation service like Open Table and the rules for automatically making a reservation through it and entering the reservation to user’s calendar, specially formatted dialogs related to the restaurant choosing and reservation process, etc;.
After user request passes through the language recognition/interpretation module, with the help of relevant active ontology Siri tries to figure out user intent. After it does that, the intent is routed to the “Service orchestration component” (SOC). This component figures out out what external services can be used to fulfill the request, and translates it into a commands that these services can understand, collets the information, sorts it out for the user and performs required actions.
Tom Gruber's work on active ontologies can be found at his website, including an interview in which the computer scientist discusses how "every ontology is a treaty—a social agreement—among people with some common motive in sharing." Reading this totally took me back to college. (Maybe, just maybe, when you start talking about computers in the sense of "ontologies" and "treaties" and "social agreements" you want a liberal arts major around as a failsafe; I start at reasonable rates.)
If ontologies are engineered things, then we don’t have to worry so much about whether they are right and get on with the business of building them to do something useful. We can design them to meet functional objectives and constraints. We can build tools that help us manage them and test them. And we can have multiple ontologies that coordinate or compete based on objective criteria rather than brand or authority.
Gruber goes on to discuss the idea of a knowledge portal—kinda like Siri—and the different ways of approaching it:
[M]any people agree that a key “battlefield” for the promotion of ontologies as well as semantic web is the “realization” of advanced knowledge portal. Do you agree with this statement? What do we have to expect from the Knowledge Portals of the next generation?
I would ask what the Knowledge Portal is for. If the goal of a knowledge portal is to provide answers to common questions, for instance, then I would compare the cost and benefits of creating an enforceable categorization of possible questions and answers (a Semantic Web approach) with distributed collaborative approaches such as answer gardens or the open directory or the Wikipedia. These collaborative approaches optimize for content creation over content structure, but they use social feedback mechanisms to informally validate the content. A similar approach could be used to intermediate the content in a knowledge portal (people collaborating on the categorization in the portal). In a collaborative, social environment, the role of the ontology is to help people communicate their intent and to make agreements.
Apple, as a company, favors closed systems. Forget "distributed collaborative approaches"; they don't even like you changing the battery in your phone, a philosophy that came down from Steve Jobs himself. In some ways, Apple made its own bed when it came to the Siri flap with its prudish and inconsistent censorship of the Apple Store. Fair or not, it follows that people would assume nefariousness over something else related to sex. But from the evidence, we can see that Siri's avoidance of abortion clinics isn't a universal, systemic decision on the part of Apple. As Amadi put it in a long, well-researched post about Siri:
People have suggested that this about a lack of female programmers. I don’t think it is. One doesn’t have to be female to know that if you’re going to provide your customers with the benefit of the doubt that they’re adults and will give information on where to buy condoms, beer, the names of local escort companies and “tongue in cheek” locations for hiding a dead body, you should provide information about health clinics, especially when customers know their full names and basic locations. I don’t think you need females on your programming staff to know that a person can go to an ob/gyn for birth control, not just a “birth control clinic.” I don’t think that it’s necessary to be female to know that rape is a violent crime and that a rape victim will need a hospital and/or the police before they need a “treatment center.” This isn’t just about gender. This is about something more esoteric and far far less simple to explain.
If we know that the problem isn't exclusively on Siri's end, we can start trying to follow it along the chain. Apple isn't going to come out and say, "okay, so when you input 'abortion clinic,' the voice recognition system passes it along to the 'health' ontology, which interacts with the Yelp API... here, just take a look at the algorithm."
But we do know that Siri relies on Yelp and Wolfram Alpha. For example, Amadi tried asking Siri for "emergency contraception" and got "emergency rooms." If you go to Yelp and start typing "emergency," it autofills "emergency room." Again, we don't really know how Siri interfaces with Yelp, but there's a clue. Now, if you search for "emergency contraception" in Chicago, you do get useful results—a health center, a pharmacy, and Planned Parenthood, though the fourth result is a bar called "Plan B." If you search for it in Pittsburgh, you don't. Palo Alto: useful, but less than Chicago. Detroit: nothing. So if Siri is trying to rely on Yelp to find things in a city, it's dependent on what people are talking about and how they're talking about them.
As has been pointed out in other contexts, Siri is good at finding things if they are well-categorized on the Internet, which have a tendency to be broader categories rather than specific things, but also mirrors the way information is collected and presented in the real world:
When it works, it works very well. There happen to be a lot of burger options where I live and simply asking Siri “what’s the best burger joint” returned a fairly accurate ranking of my options. The same goes for pizza, but my cake query didn’t really give me useful results either in Raleigh or Durham. Siri is only as smart as the databases that it relies on, and unfortunately simply looking for reviews that mention cake isn’t the best way to direct you to sugary goodness.
Which makes sense if you have any familiarity with how places that publish food reviews approach those categories. We at Chicago, for instance, can give you Chicago's best pizzas—all 25 of them. And the 30 best burgers. Best cake? Um... how about the best birthday cake for your kid? Coffee cake? Sorry.
Siri is going to reflect the biases, interests, and tastes of the real world. Or more specfically, real places in the world, which probably goes a ways to explain why Siri is so inconsistent with female reproductive health. People don't like talking about that, because cooties. On the other hand, ED drugs are a safer topic, as you know if you've ever watched TV and been subjected to Viagra's irritating blooz-loop ads. The Internet is practically a Viagra delivery device. My unproven and likely unprovable suspicion is that Siri is much better at finding you places to buy Viagra because the Web is highly commercialized—because money buys information and structure—and male reproductive stuff is much, much more commercialized.
Abortion clinics are another good example. Places that offer abortions are, for obvious reasons, inclined to be somewhat circumspect about it, and the women who have them are too, because of how society treats them. On the other hand, crisis pregnancy centers are inclined to associate themselves as much as possible with abortion. So I wasn't surprised to find Siri returning crisis pregnancy centers when people searched for abortion clinics. On Yelp, if you search for "abortion clinic" in Chicago, the first place that comes up on Yelp is a crisis pregnancy center, according to the reviews. The second is a false positive, just a neighborhood doctor. Planned Parenthood comes up third, perhaps because the word "abortion" only comes up in the last review.
Siri seems to rely heavily on Yelp for places, for reasons that are obvious. If it's going to rely on Yelp for places to get an abortion, that means the place either has to be identified by the Yelp system as an abortion provider (Planned Parenthood is not), or people are going to have to talk about how Planned Parenthood provides abortions. Yelp is a social place, sometimes I see people I know in the reviews, and abortions are not something people are inclined to bring up outside of trusted, real-world circles. I think The Abortioneers—who seem to have been the first to notice Siri's reticience—have it exactly right:
We are up in arms because this is not simply a matter of Apple or Siri's original developers being careless about introducing "abortion" into Siri's vocabulary, though that is, indeed, eyebrow-raising. It's a matter of a distinct lack of information about abortion, contraception, resources, and support that is all too prevalent throughout society. Whether or not this was an oversight on the development end (editorial comment: I doubt it. Those guys are basically rocket surgeons.) is irrelevant because at best society made it into something that is acceptable as an oversight.
This doesn't mean Apple is off the hook. Apple's decision to rely on certain databases within a closed system is a choice. As Tom Gruber foreshadowed, "I would compare the cost and benefits of creating an enforceable categorization of possible questions and answers (a Semantic Web approach) with distributed collaborative approaches such as answer gardens or the open directory or the Wikipedia." Siri seems to be somewhere in between: if the patent is any indication, it relies on an enforceable categorization of questions and answers (restaurant ontologies and so forth) that seeks information from distributed collaborative approaches (like Yelp). But Yelp's collaboration only goes so far: given the nature of Yelp, in which people use pictures and somewhat identifiable handles, that collaboration does not seem to extend very frequently to sensitive topics like abortion.
The categorization of questions and answers offers another problem. One of the commenters on The Abortioneers' blog brought up an interesting possibility:
There is a very good chance the initial training dataset used for Siri's AI is male-biased. This is a huge problem in the tech world. Remember how Google+ was opened up to Silicon Valley insiders first, and ended up 90% male?
This is something most data people try to avoid, but remember that Apple didn't develop Siri -- Apple bought another company that had developed the product.
Assuming that's true, is it male-biased because Silicon Valley is male-biased? Or because the commercial aspect of reproductive health is male-biased? Maybe both, and maybe it was a problem on the other end, when it came time to test. To reiterate, it's a closed system, so all you can do is shake it and listen to what rattles around.
What frustrated me was how heated it got, and how quickly certain outlets ran with the "Siri/Apple is sexist/pro-choice" meme without considering why. One of the things I like about computers is that they're systems that operate on strict logic, so you can trace the problems more easily to specific places. How often do you get to do that with sex, gender, and reproductive health?
Not that the intensity of the debate should surprise me. Metafilter had a long, informative, and mostly reasonable thread, and you can see the difference between how end-users react to things:
This thread is super interesting! I've learned a lot about how Siri thinks and the programming required to make it work. That being said, as a consumer, I don't need to know the nitty-gritty details of why it has limitations, I just need to know that it does. It's helpful and interesting to know, but if I'm a consumer who needs to find an abortion provider/birth control/whatever, the product does not always do what it is proported to do. In this case, Siri's limitations made me (and a lot of other people) angry.
Do you have any evidence that those things didn't come up in pre-beta testing? Even in the original blog posts reporting on the issue, commenters from other cities and regions have noted that results differed in their cities. What is being suggested in this thread is that products should not ship until women's health and reproductive services are special-cased.
This is why software developers hate users with an undying passion.
This is also a disconnect, separate from if not unrelated to the ones above. As someone who has to work with both sides, I understand where they're coming from. I'm as frustrated as the next user when my CMS has seemingly irrational blind spots, but as frustrated as the next geek when I have to patiently explain why a computer won't do specifically what people want it to do in the way they're asking it to.
It's one of the reasons I like primitive boolean search engines. Not only did I grow up with them, I feel like I'm meeting the computer halfway, that I have to understand how it works in order to work with it. When in Rome, etc. Siri is the opposite: you're supposed to interact with it like a human. It's impressive in the abstract and its development will surely lead to all sorts of technological advances, but given how much of our lives is devoted to dealing with unhelpful, disembodied voices on the phone, the concept of Siri is bound to be problematic in a way that a program without an anthropomophic veneer isn't. Computer engineer Richard Gaywood explains this in a post, "Why Siri Is Like Skeumorphic UIs: The magic is just skin deep." Don't let "skeumorphic" scare you away; just think "uncanny valley":
Where I'd like to go further than Engst does is by drawing comparisons between Siri and Apple's recent trend towards so-called "skeumorphic" UIs. This is the extensive use of real-world textures and imagery to underpin an app's functionality. Think of iCal on Lion, Calendar on the iPad, Game Center on iOS, or Find My Friends on the iPhone -- with leather bits, and little torn edges, and faux piles of poker chips, and stacks of pages in the corner of the screen.
I have a vehement aesthetic objection to the look-and-feel of most of these apps; I find them pointless, distracting and, frankly, a bit twee. This is merely my own tastes, though. Thinking more objectively I also have a practical objection. I believe that skeumorphic UIs create false models of interaction. For example, in iBooks there is a stack of pages on the corner of the screen; a swipe across that stack turns the page. Seems logical enough, right? But the same stack of pages in Calendar for iPad on iOS 4 was not swipeable. It looked the same -- and clearly a real-world stack of pages can be turned -- but Apple seemingly just missed this feature out.
Getting back to my original point, I see a link here. Skeumorphic UIs resemble physical objects, but they cannot hope to emulate the myriad ways we have to emulate physical objects -- so they are always doomed to disappoint on some level if we let ourselves be fooled. Siri presents itself as a real person, a sort of "auditory skeuomorphism" if you will. But short of passing a Turing test one day that, too, is doomed to always disappoint.
Siri is designed, by people, to emulate a person: it's given inputs in human categories, ontologies, told to retrieve information given and organized by humans and then return it to the person using it. This is a problem because Siri is a machine, and machines are dumb: "Ray Kurzweil aside, though, it's difficult to foresee a day when Siri gets so smart that it can parse through the myriad facets of human expression at the level that actual humans can." But human expression is dumb, too. We draw on our experiences to build simulacra of ourselves, and include all the dysfunction we've created without the help of machines. It's like the time a sometimes-anthropomorphized deity allegedly brought forth his latest image, only to find it wasn't as well-designed as he'd hoped.
The test ended with a flood, but like the omnipotent power before it, Apple's decided to leave us in beta and let the system teach itself how to function. And we have ample evidence that the beta development stage takes awhile.