Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • FUTURES

The list

Two mince pies sit on a plate

Illustration by Jacey

Marina gritted her teeth against the cold. She needed to focus. For six months, she’d been living in a tiny village surrounded by frozen tundra, waiting for this moment. The door creaked, and a welcome cloud of warmth curled into the frigid air. A man — she would’ve guessed mid-sixties — was grinning at her, sporting an over-sized dressing gown and loose slippers. “So, you’re the one pestering Alfy Shenk every time he enters the village.”

“Guilty.” She smiled. “Marina Burgess, senior data engineer with GDM. And you are …?”

“Not what you expected, evidently.” He patted his belly. “Trust me, by the end of the year, the weight will come back. Come on in, tea’s brewing.”

Marina scanned the tiny cottage and took a seat in an armchair in front of the fire. Everything was so … mundane. From the battered kettle hanging above the fire to the wooden staircase spiralling into a shimmering vortex. OK, the vortex was weird.

“Quite the eyesore, isn’t it,” he said, filling two mugs from the kettle. “I tried hanging a curtain, but it got sucked in, and inanimate objects don’t transition well. Nor do humans, for that matter, so don’t do anything silly.” He sank into the other armchair. “Alfy explained the time dilation?”

Marina nodded.

“That’s why this meeting has to be so brief, I’m afraid.”

“Then I’ll get right to the point. GDM specializes in using deep neural networks to model complex human data. In your case, the model output would be binary classification, and the inputs would be each individual’s behavioural information from the past year. From the training data Mr Shenk gave me —”

He chuckled. Thick and throaty. “Sneaky devil. He didn’t tell me about that.”

Marina swallowed, but he didn’t seem annoyed. Anger would be a little hypocritical as he created The List without asking the participants: a complication that put the data into a legal grey area. But for the right payoff, GDM would work with grey. “The data were anonymized, naturally, and the behavioural specifics were minimized to make identifying individuals impossible. Even so, we achieved a classification accuracy of 98%.”

For the first time, he seemed surprised. “Really?”

“Yes, based on cross-validated results from 10,000 samples. With billions, we could do even better. And of course, you can manually review the edge-cases where the network has low confidence. I understand that it typically takes 1,000 years for your team to process The List?”

“Try 2,000.” He sipped the tea. “We have to check it twice. But that’s why we do it in dilated time.”

“Yes, Mr Shenk explained that.” Marina placed her tea to the side, untouched, and folded her hands in her lap to stop them shaking. She was so close! “He also suggested that the dilation ratio can’t cope with the increasing population, causing your recent list reviews to become … cursory.”

“That little —” He took a breath, and let it out slowly. “I won’t deny it. We reached the limit of stable dilation in the fifties. But the population keeps growing, and …” He sighed. “Mistakes have been made.”

“It’s an impossible situation.” Should she put a hand on his knee? No. Too much. “But that’s why we can help. With enough training data from previous years, our system could process your upcoming list in approximately two weeks, local time.”

“These edge-cases you mentioned. How many are there?”

“Less than 2%. We anticipate reducing your workload by a factor of 50.”

“And how does your network extract the intent behind the behaviours? That’s our bottleneck in manual review.”

“I can’t tell you exactly. Intent is clear to a human reader, but how the network represents that information isn’t intuitive. Still, the results speak for themselves.”

“Intent is so important,” he mused. “The same action can move you to the top of either list depending on the reasoning behind it.” Another sip, another pause. “I can’t compensate you. It’s against the rules.”

“We aren’t asking for financial compensation.”

“You’re doing this out of the kindness of your heart, I suppose?”

Marina hesitated, but it seemed unwise to try to deceive someone who knew every action she’d taken since birth. “Our payment would be the anonymized list itself. My company has a number of mechanisms for obtaining behavioural data: device tracking, Internet activity, purchase history, the usual. But your data set is on a scale I can barely comprehend. It’s immensely valuable.”

“Well, we have some tricks that even GDM don’t know about.” He glanced at a clock perched above the fire and groaned. “Unfortunately, I must return to dilated time, but thank you for visiting.”

“No, thank you.” Marina stood.

“Please remain in the village another week. I’ll send Alfy along to make the arrangements. Naturally, this year, we’ll manually process The List in parallel and compare the results. Then perhaps we can move forward from there.”

“Excellent!” Marina said, trying to keep her voice steady. “I look forward to working with you.”

“Likewise.” He opened the door. “I should probably warn you: those tricks I mentioned can be used to erase the data set at any time, no matter where or how you store it. I hope that’s understood.”

Marina paused on the doorstep, and turned back. “I want to assure you that your trust isn’t misplaced. GDM has an impeccable reputation for ethical data use. Last year, we helped —”

That chuckle again, deep as an ocean. “Relax, Marina,” he said, closing the door. “It’s fashionable to be on the naughty list these days.”

The story behind the story

Pip Coen reveals the inspiration behind The list

The list has been sitting on my virtual notepad for a while under the heading ‘Santa’s misery’. It started with a mulled-wine-inspired debate regarding the worst part of Santa’s job. I’ve never been a fan of paperwork — as our lab manager can attest — so I have no doubt that making a list with 7 billion entries is more exhausting than globetrotting with magical reindeer or honing your practical skills in the workshop. And that’s not even accounting for having to check it twice!

A few months later, during lockdown, I came across this uninspired kernel in my notepad while running some binary classification analysis (i.e. while procrastinating). It struck me that no sensible individual would manually check a list of 7 billion entries: you’d train a classifier to do the job instead. And naturally, anyone with access to the data set underlying that analysis would be the envy of our data-hungry world.

Once the two ideas came together, I immediately started writing … I mean, I finished the data analysis and then started writing The list. Hopefully, you enjoyed the finished result!

P.S. I may, or may not, still be working on the same binary classification problem.

doi: https://doi.org/10.1038/d41586-020-03617-2

Nature Careers

Jobs

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing

Search

Quick links