Is Algorithmic Judgment Creepy or Wonderful?

For the Nieman Lab’s Predictions for Journalism 2015, Zeynep Tufekci writes that

We’re seeing the birth of a new era, the era of judging machines: machines that calculate not just how to quickly sort a database, or perform a mathematical calculation, but to decide what is “best,” “relevant,” “appropriate,” or “harmful.”

Tufekci believes we’re increasingly “creeped out” by this trend, and she thinks that’s appropriate. It’s not the algorithms themselves that bother her so much as the noiselessness of their presence. Decisions are constantly being made for us without our even realizing it, and those decisions could reshape our lives.

Or, in some cases, save them. At FiveThirtyEight, Andrew Flowers reports on the U.S. Army’s efforts to apply machine-learning techniques to large data sets to develop a predictive tool—an algorithm—that can accurately identify soldiers at highest risk of attempting suicide. The Army has a serious suicide problem, and an algorithm that can help clinicians decide which individuals require additional interventions could help mitigate that problem. The early results are promising:

The model’s predictive abilities were impressive. Those soldiers who were rated in the top 5 percent of risk were responsible for 52 percent of all suicides — they were the needles, and the Army was starting to find them.

So which is it? Are algorithmic interventions creepy or wonderful?

I’ve been designing and hawking algorithms to help people assess risks for more than 15 years, so it won’t surprise anyone to hear that I tilt toward the “wonderful” camp. Maybe it’s just the paychecks talking, but consciously, at least, my defense of algorithms starts from the fact that we humans consistently overestimate the power of our intuition. As researchers like Paul Meehl and Phil Tetlock keep showing, we’re not nearly as good at compiling and assessing information as we think we are. So, the baseline condition—unassisted human judgment—is often much worse than we recognize, and there’s lots of room to improve.

Flowers’ story on the Army’s suicide risk-detection efforts offers a case in point. As Flowers notes, “The Army is constructing a high-tech weapon to fight suicide because it’s losing the battle against it.” The status quo, in which clinicians make judgments about these risks without the benefit of explicit predictive modeling, is failing to stem the increase in soldiers’ suicide rates. Under the conditions, the risk-assessing algorithm doesn’t have to work perfectly to have some positive effect. Moving the needle even a little bit in the right direction could save dozens of soldiers’ lives each year.

Where I agree strongly with Tufekci is on the importance of transparency. I want to have algorithms helping me decide what’s most relevant and what the best course of action might be, but I also want to know where and when those algorithms are operating. As someone who builds these kinds of tools, I also want to be able to poke around under the hood. The latter won’t always be possible in the commercial world—algorithms are a form of trade knowledge, and I understand the need for corporations (and freelancers!) to protect their comparative advantages—but informed consent should be a given.

Leave a comment

12 Comments

  1. So, thanks for this great post. Great issues raised. First, let me say I have two claims: One, people ARE creeped out by subjective algorithmic decision-making. I wouldn’t say this is unjustified, but I also wouldn’t say oh, let’s not use algorithms because human judgement is so great. From eye-witness to expert analysis, we know that human judgement, recall, etc. can … um, suck. I don’t disagree with you there. Two, and this is my key point, we have millenia of grappling with human subjectivity, whole set of cultural, political and social institutions, norms and mores, etc. Algorithmic subjectivity and algorithmic judgement are going to come with a whole host of NOVEL biases, issues, false vs. negative positive, data considerations.. and different kind of errors. Watson can be the smartest and fastest Jeopardy player and still “think” that Toronto is a US city. This is a different error pattern, a weird one from human standards, and the implications of which we have not dealt with. Computational subjectivity, applied to problem for which we have no absolute benchmarks or anchors to calibrate, is a different level of issues than something like automating parts of piloting and then checking if flying is safer when the pilot is aided by the computer (which it is). Also, there is a huge difference between the kind of algorithmic assist you are talking about, and the kind of enveloping of proprietary, opaque, unaccountable and commercially-motivated algorithms that are enveloping more and more of our digitally mediated interactions–from the personal to the social. If algorithms were just deployed to assist is in making better decisions, or if they were only used for procedures where we have external criteria by which to judge “correct” or “effective”, I’d have somethings to say, but not this.

    Reply
    • Thanks very much for clarifying and amplifying your thinking here.

      At a broad level, I agree with you. If I hadn’t been such a lazy blogger, I would like to have dug deeper into two aspects of this question: 1) algorithm-only vs. algorithm-assisted decision-making, and 2) the nature of the decisions being made.

      On the first, I have a strong preference for the algorithm-assisted approach—or where that’s not an option, for making as transparent as possible what the algorithm is doing. That way, we can at least be thoughtful about what we’re getting if we want to be. It sounds like you and I see eye to eye on this issue.

      On the second, I’m interested in our emotional responses to algorithms and how that varies with the nature of the decisions to which they’re related. Daniel Kahneman wrote something about this in Thinking, Fast and Slow (pp. 228-229) that stuck with me:

      The aversion to algorithms making decisions that affect humans is rooted in the strong preference that many people have for the natural over the synthetic or artificial. Asked whether they would rather eat an organic or a commercially grown apple, most people prefer the ‘all natural’ one. Even after being informed that the two apples taste the same, have the same nutritional value, and are equally healthful, a majority still prefer the organic fruit…

      The prejudice against algorithms is magnified when the decisions are consequential. [Psychologist Paul] Meehl remarked, ‘I do not quite know how to alleviate the horror some clinicians seem to experience when they envisage a treatable case being denied treatment because a ‘blind, mechanical’ equation misclassifies him.’ In contrast, Meehl and other proponents of algorithms have argued strongly that it is unethical to rely on intuitive judgments for important decisions if an algorithm is available that will make fewer mistakes. Their rational argument is compelling, but it runs against a stubborn psychological reality: for most people, the cause of a mistake matters. The story of a child dying because an algorithm made a mistake is more poignant than the story of the same tragedy occurring as a result of human error, and the difference in emotional intensity is readily translated into a moral preference.

      I won’t presume that Kahneman’s is the last word on this topic, but it certainly sketches the contours of some fascinating terrain.

      Reply
      • Ah, but it gets more and more complicated the deeper you delve. Meehl’s point is valid that being forced to abide by the decision of an algorithm might leave people who need care unable or ineligible to receive it. The decision of the judges is final, and the judges aren’t even human! I too have been programming “intelligent” responses to human behavior for many years, which (for my purposes) has usually involved figuring out the boundaries of human behavior and programmatically accounting for every possibility. So really, the user has already been boxed in. But in “real life” there are no boxes or cubbyholes to put people, every psyche is different. This raises two immediate questionable issues with algorithmic responses: First, and obviously, what if the behavior doesn’t fit in the expected patterns? What happens when none of the questions on the Army’s suicide test show the intent of the suicidal soldier? Obviously that one falls through the cracks. But second, in the long run, how will the test affect the definition of suicidal tendencies? In other words, as the test becomes more and more accurate over time, what will it take to find the outliers? A human? Luck? Will suicidal soldiers learn what behaviors not to do to be pulled in and interviewed by the autopsychiatrist machine?

        Lastly, I want to point out that most people would probably prefer to put their lives in the hands of a human being who will try their hardest to save their life. We could, today, equip all our airplanes with drone pilots sitting in comfy chairs in Nevada. They wouldn’t have to work long hours (one person can “fly” the plane for part of the flight, and the next shift’s pilots could take over midflight). But the cost of failure for an on-board pilot is the same as for his or her passengers, and I think people like that. If a psychiatrist makes a mistake and someone he/she said was safe commits suicide, it reflects on that psychiatrist’s career and the psychiatrist presumably feels real emotional pain, either for the victim or, at the very least, for their own career path (but who thinks that’s ALL they would think about?). But either way, they feel real pain. So what if an algorithm gets tweaked to make it more accurate, maybe even automatically tweaked? There’s no consequence to an algorithm for being wrong, or at least, it doesn’t worry about being shut down or having its career path altered. It doesn’t worry about anything.

        Cold-blooded decisions are what makes an inquisition.

        My robot (aka keyboard-computer-screen) typed all this. My spell-checker reviewed it AND HAD SOME PROBLEMS WITH IT. I overruled it.

      • I have to admit, I don’t follow the logic of your blanket preference for human control. If a remote pilot or autopilot was associated with, say, a 20-percent decrease in the risk of a crash, wouldn’t you rather reduce your risk than have a “warm-blooded” controller for its own sake? When the plane is crashing, who cares whether or not its controller suffers, too? If having that “skin in the game” produced a lower error rate, sure, but then it’s still the error rate we care about, right?

  2. So, you are talking about foibles of human judgment. I hadn’t said anything about that in my (also short) piece and I don’t really disagree. Humans aren’t rational, utility-maximizing, endowed with perfect memory, etc. Human judgment, memory do fail, routinely. That is kind of orthogonal to my point.

    On the foibles of human judgment. I hadn’t said anything about that in my (also short) piece and I don’t really disagree. Humans aren’t rational, utility-maximizing, endowed with perfect memory, etc. Human judgment, memory do fail, routinely. But, you are still talking about questions for which there is a right answer, hence we can calibrate and have a sense of error.

    So there is question number one: should we use algorithms when there is a right answer, and we can test and make the algorithm work better. And question number two: how to proceed when where we have no such anchor, no objective measure by which we can calibrate–and that is what I’m talking about.

    There are the issues Kahneman raises in terms of accountability but that’s not what I’m talking about either. I’m talking about subjective machine judgment (not assisted–I’m not talking about the world of the professional elite) and that is being deployed widely, on commercial platforms (and more and more objects), with control mainly belonging those of the owner of platforms (or the objects), in an opaque and unaccountable manner. So I appreciate the discussion, but I don’t think you are objecting to what I’m talking about. These subjective judgments–editorial, political, financial, etc–are deeply contested and our politics, culture and institutions are all built on grappling with that contestation. Turning it over to machines, almost invisibly, is a way of depoliticizing and disappearing what are deeply political and subjective questions.

    And tech people, especially those in data science, who are in a position to really understand what’s going on (unlike the general public who sees very little), keep coming back to arguments like mine with “humans make mistakes, too” or “we can make algorithms better” as if that’s the world we live in. It’s disappointing because it’s so clear that is not the world we live in. Algorithms are not being deployed in an accountable fashion, and there is little interest from their owners in probing the fact that they are increasingly making subjective decisions, with no transparency or questioning.

    The problem is political, but also technical because we are not grappling with how algorithmic decision making has different error patterns than human decision making. And that’s a whole other ballgame. All this is why I don’t think “but humans make errors, too” is an adequate answer because we aren’t in the territory of error–we are in the territory of judgment

    Reply
    • I agree with much of what you say and would only underscore your point about the fallacy of depoliticization. If I can find the time and energy, I’ll try to write a follow-up post in the next couple of weeks that makes more explicit the conditions under which I think algorithmic judgment is most and least welcome.

      In the meantime, though, I’d like to point out that the accountability concerns you raise aren’t new or specific to (explicit, software-based) algorithms. Take, for example, editorial boards at major newspapers and producers at major television networks and radio stations, who have been wielding this kind of opaque and unaccountable power over what we see and hear and read of the world for many, many years. I don’t buy the hype about data-and-software driven versions of these gatekeepers being inherently better than those chambers of (mostly older, almost all white, mostly male) humans, but I don’t see them as inherently worse, either.

      Reply
      • But you are making my point! Media has a lot of power, exactly because of its gatekeeper and idea-creating/disseminating role. And that’s why we study them, write books about them–we debate and teach media ethics, we protest them, we make laws about them, we have ombudsman, etc. And yet media has a lot of power but we also have all these ways of trying to grapple with that power. And that’s why it matters so much media has been “mostly older, almost all white, mostly male.” And when a government pressures media, we realize this has implications for that country’s governance… So now we have a new kind of power, also with gate-keeping and social control roles, also commercially controlled. The answer isn’t but the other one isn’t perfect (of course not!). The answer is: so what are the structures of accountability, understanding transparency, laws, culture, ethics, etc. we need or this one? I’ve never made the argument that algorithmic judgment is inherently worse (or better) but just that it’s different, and that it has real power, and it is spreading rapidly, and transparently. My take isn’t Andrew Keen on Algorithms. 🙂

  3. Thanks for your response. I wasn’t, however arguing for “blanket” human control the way you seem to mean the phrase: A Luddite, no-computers-in-the-cockpits sort of way.

    For the airplane control example, there are really many choices: Algorithmic flight control, human flight control, algorithmic control with human override capabilities, vice-versa, and, if you can figure out how to program it, some sort of system where both are in control — but really, ultimately one has to be master, the other slave. Like pilot and co-pilot, but the co-pilot is a machine. If it were human, then the co-pilot’s sheer willpower could overcome, say, a suicidal pilot intent on killing everyone on board the airplane (there are a number of instances where a suicidal pilot has been blamed for an airliner crash). At least we like to assume a stronger will would overcome a crazy psychopath intent on mayhem. How about two pilots and a machine all deciding together what to do? Or two independently-programmed machines and one pilot? Who would be the arbiter? And if the pilots aren’t on board the plane, can their transmissions be intercepted? No matter where they are, if something goes wrong they’ll care more than a robot will — unless they’re psychopaths, and uncovering psychopaths has proven extremely difficult. Uncovering more-or-less normal people with suicidal tendencies is hard enough.

    Regarding finding suicidal tendencies in veterans, let’s assume an arbitrary 20% difference in failure rate for the purposes of discussion, and that the automated program is better than the human psychiatrist. But who are the missing 20%? All lives are equal, but some are vastly more productive than others. My brother-in-law can’t get a new kidney because other, younger and healthier people need any kidneys that come up for transplantation. A robot didn’t need to make that decision, a medical board did — a whole board, not just one human brain! As far as I know, there is no appeal of their decision. But perhaps a past president (or vice-president) might get a kidney anyway. Maybe Einstein would have too at any age, if he had needed one. Who is to decide?

    When will we have robots smarter than Einstein, or at least able to make the leap to knowing how to mathematically convert matter into energy, or reason that such a thing is possible? How will a robot know if another robot is lying to it?

    I can easily imagine algorithms helping humans make decisions by marvelously providing oodles (a technical term meaning lots) of data, so much data that no algorithm can possibly interpret that data — a human brain is needed for that final task. That happens all the time already. That’s what computers are best at, IMHO. That’s what I try to program them to do.

    It frustrates me tremendously when search engines tailor their responses to my queries to what the search engine “thinks” I want. As a researcher, it is very difficult to use such algorithmic responses to get the contradicting data I need to disprove my hypothesis. Everything comes up roses even when that’s only half the picture. It’s uncanny.

    Reply
  4. Military personnel have waived rights to suing the government, so attributes such as age, gender, race can be used in prediction models. The focus of this study may be “grim”, but not necessarily “creepy”, as the Soldiers have implicitly opted in and the outcome presumably results in pre-emptive action to protect the Soldier.

    A “creepy” scenario is the use of these attributes for, say, granting/denying insurance to consumers. There is very strong correlation between many personal traits and outcomes. If we dig deep enough into data mining DNA, will we create a dystopic future, ala the movie Gattaca?

    It’s difficult to imagine many “wonderful” scenarios where use of personal information is involved. But the balance of ethics will always hinge on opting-in to “share” personal information in exchange for some value; or as you state, “informed consent”.

    Reply
  5. Today, for all normal flight operations, autopilots are much, much safer than human pilots (if only because they don’t fall asleep), and no one really wants to fly any other way — if only to keep the pilots fresh for…when things get crazy: Nonworking engines, instruments broken, controls responding poorly, radio out, nowhere to land, leaking fuel… it might be a while before software autopilots can routinely handle something like that. But statistically, does it matter? Probably not. Certainly for normal airline flight operations, we already hardly need pilots, and perhaps they already get in the way as much as they help, statistically. And greatly bump up the cost.

    Yet, even riding on a monorail with no operator still feels funny. Nevertheless it was a human driver that almost backed over my brother last week. A bit of incoherent yelling stopped a potential tragedy. Having to clearly say, “Okay, Google” first wouldn’t have worked.

    Reply
  1. Somewhere else, part 194 | Freakonometrics
  2. Is Algorithmic Judgment Creepy or Wonderful? | No. Betteridge’s Law

Leave a Comment

  • Author

  • Follow me on Twitter

  • Follow Dart-Throwing Chimp on WordPress.com
  • Enter your email address to follow this blog and receive notifications of new posts by email.

    Join 13.6K other subscribers
  • Archives