What the Enron E-mails Say About Us

Scholars have spent years analyzing the corporation’s vast digital archive. What have they discovered?

July 17, 2017

The Enron corpus provided a data dump of workplace communication styles.

A measure of industrial progress is the speed with which inventions grow insufferable. The elevator, once a marvel of efficiency, has become a social purgatory from which most of us cannot escape too quickly. The builders of the first commercial airplane couldn’t have foreseen the crushed knees and the splattered salad dressings that their machine would visit on the world. “Hitherto it is questionable if all the mechanical inventions yet made have lightened the day’s toil of any human being,” John Stuart Mill wrote in the “Principles of Political Economy” (1848), and the precept holds for recent innovations, too. Think of e-mail. Or, rather, try not to think of e-mail, since, chances are, while you floss, steep tea, make love, or read these sentences, new messages are proliferating in your inbox, colonizing your time and your brain. Sure thing, you type back to a needy stranger who seems unable to punctuate. Sounds good. Actually, it sounds like death. Once upon a time, you knew that you could log off e-mail and, like Cinderella before midnight, gain a few hours of deliverance from the day’s digital scut work. Now your inbox nags you on your smartphone, and the only prince who might help is Nigerian, with a need to stow his fortune somewhere safe.

Add to this the knowledge that your e-mail self is probably your worst. “Exposure of my emails would reveal not only deep fears and worries, but also my shallow personality,” the writer Delia Ephron fretted in a comic essay, after Sony, where she’d done business, had its accounts hacked. That was in 2014, and the stakes of inbox security have risen since, even as standards of conduct grow vague. By some accounts, it was a popular obsession with Hillary Clinton’s inbox which cost her the election. By others, it was WikiLeaks’ release of messages from the Democratic National Committee. E-mails from the Vice-President’s former account showed up in March (divulging the Second Lady’s private contact information), and, in May, hackers delivered a cache from Emmanuel Macron’s campaign inboxes in the apparent hope of swaying voters. (The press held back, and the people of France, who appear to prefer their epistolary scandals served blue, shrugged.) E-mail made the news again last week, when the Times reported that a message from 2016 offered Donald Trump, Jr., opposition information from Russia. Then Trump fils released his e-mail thread online.

Given that e-mail leaks can imperil governments, it seems odd that correspondents spend so little time reviewing basic work before they press send. Writing, along with fire-making and the invention of the wheel, is widely held to be a milestone of human progress. This view will seem naïve to anybody who has read much human writing. In its feral form, prose is unhinged, mystifying, and repetitive. Writers feel moved to “get things down on paper,” usually incoherently, and even in guarded moods say alarming stuff because they don’t know where to put their commas. (“Time to eat children!”) The true wellspring of civilization isn’t writing; it is editing. E-mail, produced in haste, rarely receives the requisite attention. That is bad for us but good for posterity—and for students of the literary gestures we imprudently put in pixels. When inboxes are gathered, cracked open, and studied, they become a searchable, sortable atlas for the contours of our social minds.

Not long after the Enron Corporation imploded amid revelations of accounting fraud, in 2001, the Federal Energy Regulatory Commission seized the e-mail folders of a hundred and fifty-one mostly high-ranking employees, the better to discover the discoverable. Before long, the commission made a startling announcement: it would release this body of e-mail online, to substantiate its findings. “The release of the information now will enable the public to understand better the evidentiary record on which the Commission’s decisions in those proceedings are grounded,” it explained. “The Commission may release the information if the public’s right to disclosure outweighs the individual’s right to privacy.”

The Enron archive came to comprise hundreds of thousands of messages, and remains one of the country’s largest private e-mail corpora turned public. Its lasting value is less as an account of Enron’s daywork than as a social and linguistic data pool, a record of the way we write online when we’re not preening for the public eye. Like a hot-dog bun beset by seagulls, the archive has been pulled apart and pecked up; it has been digested by computers and referred to by more than three thousand academic papers. This makes it, in the annals of scholarship, something strange: a canonic research text that no one has actually read.

Mostly, that’s because it is too long, and too boring, for complete human consumption. When the e-mails were released, in 2003, the dump was more jumbled than even computers could handle, so a researcher at M.I.T. purchased the bundle and, with help, began to put it in a processable order. Folder structures were reinstated. Redundancies, automated messages from Listservs, delivery-failure notices, and other pieces of modern detritus were trimmed away.

The resulting corpus, down to a few hundred thousand e-mails, helped to mark a shift in research premise from the cult of authorship (these texts are interesting because a notable mind made them) to the cult of the commons (these texts are interesting because of what, together, they show). The things they show frequently serve the cause of automation. One of the first projects to employ the Enron corpus was a self-described “extensive benchmark study of e-mail foldering.” It used seven large accounts to help determine whether people organized their e-mail in ways that might be replicable by machine intelligence. (“Email foldering is a rich and interesting task,” the study’s lead author, Ron Bekkerman, noted, in what may be the paper’s most surprising conclusion.) The answer was not yet: people are too idiosyncratic in the ways they organize their stuff. Another team used the corpus to develop a “compliance bot” that could identify sensitive elements in text and alert writers if a message might get them in trouble.

These endeavors served a basic purpose: protecting users from their foolishness. Other studies focussed on Enron itself. Noting that “a small number of users have sent a large number of messages”—a fact that will shock no one who gets e-mail at work—one research team mapped epistolary ties on a Gower layout (a connect-the-dots plot) to understand who was in contact with whom. They found a tight nest of connections around Enron’s president, vice-president, and C.E.O. Angled off to either side were ears with more remote networks of traders, managers, and lawyers. The plot looks like a donkey head.

It also looks more or less like what you’d expect. The corpus rapidly highlights the difference between rich data and useful information. An M.I.T. student working on a compliance bot noted that it seemed nearly impossible to identify evidence of financial misconduct using basic search strings. He had more success tracking down pornography—of which there was, oddly, a lot—with words like “sex.” Also, it was easy to find racial slurs.

Computers can do little with a text that humans could not, but they make some laborious work go faster. In 1949, an Italian Jesuit priest named Roberto Busa presented a pitch to Thomas J. Watson, of I.B.M. Busa was trained in philosophy, and had just published his thesis on St. Thomas Aquinas, the Catholic theologian with a famously unmanageable œuvre. (Work on a multivolume critical edition of Aquinas’s philosophy, commissioned by the Vatican, began in 1879 and is nowhere near done.) Busa had begun to wonder whether Watson’s computing machines could aid his work. Watson backed him, and, for the next thirty years, Busa encoded sixty-five thousand pages of Thomist text so that it could be word-searched, cross-referenced, and what we now call hyperlinked. The Index Thomisticus was the first corpus to be primed for digital scholarship, no less impressive because it started on punch cards and ended up online. “Digitus Dei est hic!” Busa punned in 2004. The finger of God is here.

By then, using computers to assess large bodies of written text had turned to profane projects. Computational linguistics, the study of computer-replicable rules and patterns in real-world language, began in earnest in the nineteen-fifties, originally in the service of Cold War intelligence: the United States wanted to use computers to mass-translate Russian texts into English. (The U.S.S.R., of course, wanted the opposite.) By the late sixties, the endeavor had reached literary commerce. Houghton Mifflin used the so-called Brown corpus, a body of five hundred varied texts from 1961, to produce the first edition of the American Heritage Dictionary of the English Language, in 1969: one of the earliest reference guides that included descriptive information about the way words were actually deployed in print. Research on so-called corpus linguistics revealed some puzzling properties of usage. In the thirties, the linguist George Kingsley Zipf had posited that a word’s frequency is inversely proportional to its rank in the frequency table—the third most common word would show up one-third as often as the most common word, and on—and the Brown corpus and others have appeared to bear this out. Zipfian projections are inexact, especially far down the table, but the curve seems to hold broadly. It is unclear why.

A field known as digital humanities has emerged around text-crunching analysis in its modern form. A key advocate of the method, Willard McCarty, touted computers’ virtues as “modelling machines”: they can test and discard working theories without years of exploratory work. Textual mapping is a popular function; a recent project, in Denmark, used artificial intelligence to comb through thirty thousand witchy folktales and geographically plot their elements. (It revealed, among other things, that witchcraft allegations in Protestant Denmark tended to arise in the vicinity of Catholic monasteries.) And, because computers are great at searching, they have been a boon for stylistics: the study of the words, phrases, or images that recur across a work. Such analysis, in its eccentric span, includes Robots Reading Vogue, a project at Yale’s Digital Humanities Lab which, drawing on archived correspondence, gins up memos in the scattered style of Diana Vreeland. “Also small stones, small straps. It would be interesting, and Diane de Mere, etc., . . . The marvelous summer look,” some computer-generated Vreelandisms read. Although the project is amusing, coming up with nonsense is the one thing with which humans need no help.

Still, these behavior-patterning approaches produce insights when applied to the Enron corpus. A pair of researchers at Queen’s University, in Canada, had some success applying “deception theory”: the idea is that disingenuous e-mailers tend to minimize first-person pronouns, use more negative-emotion and action words, and write with “an excessive blandness.” Their search turned up a number of misconduct-related e-mails, although further analysis was still required as a final filter.

“It’s not nepotism if I’m the best son for the job.”

Other projects got more specific. A 2011 study from the University of Washington crawled through the e-mails to see how tonal formality tracked onto the nature of a message, rank difference, social familiarity, and the number of recipients. Most results were unsurprising: people e-mailed more formally when dealing with business, across a gap in rank, with people they scarcely knew, and to a bigger audience. Oddly, though, e-mails grew more informal as the list of addressees expanded beyond ten. The researchers hypothesized that people like to strike a slouchy pose before big workplace audiences, the better to seem the cool kid in a class of dweebs.

In the way that years have springtimes, most epistolary careers have a swell. Maybe yours came in July, at camp, when 4 p.m. felt like a lonely hour. Maybe it started in the season that arrives after a failure or a death, or in the crisp evening that closes a lucky day. Mine arrived when I was a college exchange student in France: four classes, few friends, and a shared apartment across from a fire station where, most mornings, pompiers paraded out onto the sidewalk to unroll, and then reroll, their hoses. I would go to a creaking amphitheatre to watch a lecture by a preening giant of French literary theory. I would continue to a small room where a scholar with a prim, babylike mouth read verbatim from an outline, which the students dutifully copied onto pristine quadrille paper using fountain pens. At lunchtime, I’d sit in the park with a €2.80 sandwich and write letters across the tops, bottoms, and backs of greeting cards, descanting on random but—I believed—revealing details. The French magazines photographed intellectuals in odalisque poses, I’d report. The stations on the Clignancourt-Orléans line smelled like baking yams. When I think back on this period, what strikes me most is how fresh my flint was, how the lightest brush with a larger world could scatter sparks, smoke up my eyes, burn through hundreds of words. My e-mails, horrifyingly, would run longer still.

The Enron corpus seems unburdened by such correspondence. “Where are you right now? i am in london,” Greg Whalley, the company’s president after Jeff Skilling’s departure, wrote a colleague inquiring about a meeting. “Congratulations! Keep up the good work,” Teb Lokey, a manager for regulatory affairs, tells an employee. (That is the whole message.) An analyst found about half of the e-mails to be one sentence long, and those that run on aren’t always more substantive.

When the Enron corpus first became available, some people described its catalogue of tics and corporatese as “cliché”—less embarrassing to Enron, possibly, than to the species. (Who among us has not stood atop millennia of human language and, after a moment of reflection, signed an e-mail “Best”?) To the extent that “cliché” is another word for recurring cultural pattern, these platitudes are exactly what computer analysis embraces.

In 2014, an enterprising business-English teacher named Evan Frendo had the idea of using the corpus to locate phrases helpful to the foreign businessperson working with Americans. After what must have been punishing study, he discovered a fixation on “ball” metaphors. “I thought I’d get the ball rolling,” one Enroner wrote. “Sounds like you guys had a ball at dinner,” another said. “I played hard ball and told them that I had to have more time,” a correspondent reported. “Someone REALLY dropped the ball here!” an employee chides. “From June 1, we will be totally on the ball,” reads an e-mail that you don’t believe. “I will pretty much leave it in your ball park about Friday night,” somebody writes (a message that Frendo correctly annotates “???”). All told, the corpus contained six hundred and two instances of ball speech, apparently covering every scenario in modern American business. It is not clear that this compendium eases the task of the Danish banker on a morning flight to Dallas. But perhaps it tells him where to focus his study.

Naomi Lancaster, a graduate student at Ball State University (!), established that Enroners didn’t generally open with “Dear,” as most etiquette guides suggest, and favored “Hey,” “Hi,” or “Hello,” leading Lancaster to believe that the etiquette proxy for e-mail wasn’t written letters but speech. Only six per cent of the e-mails she examined had any greeting at all; most began in medias res. The employees most likely to use a friendly greeting were women not in positions of authority, followed by men in subservient positions. Powerful men were the most likely just to open an e-mail window and start typing. In some cases, an e-mail would simply be addressed “Guys.”

The challenge of beginnings is not particular to e-mail—nor are its gender condescensions new. “Strange as it may seem, we continue to receive letters from people interested in the problem—broached by us last June—of the correct salutation to use in a letter to a girls’ school,” E. B. White and Elizabeth Hawes wrote in the Notes and Comment department of this magazine, in 1931:

First there is a communication from Thomas O. Mabbott, Ph.D., assistant professor at Hunter College, who says that the head of his department writes, “Dear Colleagues.” . . . An etiquette writer in the World-Telegram, propounding the same problem, by a funny coincidence, advises the use of the French “Mesdames,” followed, the writer goes on, “by the customary dash.” A man in Baltimore writes that the Governor of the Virgin Islands once wrote a letter to Goucher College beginning: “To the director of one group of virgins from another,” which we neither believe nor think funny.

A letter, like the social speech for which it substitutes, is frayed by awkwardness at either end. We spend half of our lives struggling to start conversations and the other half struggling to exit them. In the middle is the thing itself, and here, it turns out, we are slightly better than machines. What is sometimes called “sentiment” or “tone” analysis presents a challenge for computers, which can stumble over simple words. Consider “pretty”: it can intensify some descriptions (“The hot dog was pretty amazing, but the bun was pretty dry”), dial back others (“That Zumba class was pretty good, I guess”), convey beauty (“What a pretty wooden trellis!”), or add irony (“What a pretty kettle of fish”).

The limits of corpus analysis, in other words, are human; in the gap between data and knowledge, we fall back on our social understandings of the world. This recourse can help computers with complex use cases, such as “pretty.” But when help is supposed to flow from machine to human, we can end up gazing into a mirror, not a clarifying lens. Like the work of the midcentury structuralist anthropologists, corpus analysis purports to pattern-seek dispassionately. The endeavor, though, requires focussing on certain patterns over others, and imbuing them with a relational logic based on what’s already known. We learn as much about our social selves in the act of interpreting the Enron corpus as we do in the e-mails themselves. Behind the meaning of the commons, there’s an author still.

In the iconoclastic 1980 book “Is There a Text in This Class?” Stanley Fish attacked the field of stylistics, and the tendency to equate the work of the humanities researcher with the work of the scientist. The equivalence was false, Fish thought, because the inquiries had different goals. Scientists were trying to zero in on something fixed and unknown: the laws of nature and their potential applications. Humanists were working with something variable and contingent: the way a text produced meaning for a given group of readers. You could turn up patterns in any long piece of writing without showing that such patterns were germane to how the work communicated. The most revealing question about a piece of text was the obvious one: How does it mean?

This is the question least scrutinized in the Enron corpus, perhaps because reading two hundred thousand e-mails, let alone finding a unified, intended narrative in them, seems a hopeless project. But it is not until you descend from thirty-seven thousand feet that life starts coming into focus once again.

Personalities turn out to matter; stories, too. Small, sometimes moving dramas unwind in the folders of sent mail. In May, 2001, a trader who is given to enthusiastic, exclamation-laden e-mails tells a friend that it’s already getting hot in Houston, which is a pain, because he’s begun jogging again, to lose 8.5 pounds. He has just been through a breakup. A vice-president is having a custody battle in September, 2001, and sends a legal aide a frenzied, unedited, and wrenching plea: “How can she be aloud to keep me from my son?” Some of the most interesting messages were never meant for anyone else’s eyes. That same jogger, still romantically at loose ends, e-mails his Hotmail account a link to workouts on fitnessheaven.com. An employee on the legal team sends his personal AOL account a joke he may have found worth mastering. (“Moses, Jesus and an old man are golfing,” it begins.) “Do you know what’s included in Enron’s Code of Ethics?” an e-mail advertising an in-house informational event prompts. “Do you know what policies affect corporate conduct? Ask Sharon Butcher, Assistant General Counsel of Corporate Legal, all your questions about our corporate policies today.” The message was sent on June 5, 2001. Ten weeks later, Jeffrey Skilling resigned as president and C.E.O. A programmed search could find this e-mail, but it wouldn’t be able to locate the irony. For this, we need the same human instrument—faulty, romantic, and duplicitous—that brought Enron to that self-defeating point.

The tendency to weave stories where evidence is missing is the human brain’s sustaining feature, precipitating heroic action, senseless love, and mindless hate. Broadening the data pool has no chance of dissolving these delusions, because people generally deal with huge volumes of information in the same way that they deal with small ones: by sifting and discarding, then connecting dots to make a picture out of what remains. They latch onto results that bear out narrative and hopeful theory. They seek a private order in the chaos of the world.

When the Enron scandal broke, last decade, e-mail was the most wanton kind of media. It is no longer so—people now have indecent texts at home, manic Slack threads in the workplace, and, for just about every venue, crankish, boastful Facebook, filled with babies and bad news. As the scandals of the past few years show, however, indecorum hasn’t left our inboxes, and the lives behind the @ symbol may still have something to hide. For many of us, that seems all right. The urgent project at the moment isn’t adding more information to the cultural file; it is understanding how meaning is produced, how stories wrought from narrow data samples seed and grow in the public imagination. Such work will tell us more about contemporary communication than another e-mail archive. As a sign of twenty-first-century progress, it can’t come too soon. ♦

Weekly