It’s been said that the uncharted territories of the old maps were often marked with the ominous warning: “Here be dragons.” Perhaps apocryphal, the idea was that no one wandering into these unknown corners of the world should do so without being ready to battle a terrifying foe. Anything could happen in these mysterious regions, and often that anything wasn’t good.
Programmers may be a bit more civilized than medieval knights, but that doesn’t mean the modern technical world doesn’t have its share of technical dragons waiting for us in unforeseen places: Difficult problems that wait until the deadline is minutes away; complications that have read the manual and know what isn’t well-specified; evil dragons that know how to sneak in inchoate bugs and untimely glitches, often right after the code is committed.
There will be some who rest quietly at night, warmed by their naive self-assurance that computers are utterly predictable, earnestly churning out the right answers. Oh, how little they know. For all the hard work of chip designers, language developers, and millions of programmers everywhere, there are still thorny thickets of programming problems that can bring even the mightiest programmers to their knees.
Here are seven of the gnarliest corners of the programming world where we’d put large markers reading, “Here be dragons.”
It sounded like a good idea: Break your program into independent sections and let the OS run them like separate little programs. If the processors have four, six, eight, or even more cores, why not write your code so that it can have four, six, eight, or more threads using all of the cores independently?
The idea works—when the parts are in fact completely separate and have nothing to do with one another. But once they need to access the same variables or write bits to the same files, all bets are off. One of the threads is going to get to the data first and you can’t predict which thread it will be.
Thus, we create monitors, semaphores, and other tools for organizing the multithreaded mess. When they work, they work. They merely add another layer of complexity and turn the act of storing data in a variable into an item that requires a bit more thought.
When they don’t work, it’s pure chaos. The data doesn’t make sense. The columns don’t add up. Money disappears from accounts with a poof. It’s all bits in memory. And good luck trying to pin down any of it. Most of the time developers end up locking down big chunks of the data structure so that only one thread can touch it. That may stem the chaos, but only by killing most of the upside of having multiple threads working on the same data. You might as well rewrite it as a “single-threaded” program.
The name doesn’t help—it’s not like access is closed down permanently like a bar announcing last call. If anything, access is open but only through a wormhole in the data-time continuum, a strange time-shifting mechanism that is bound to eventually spawn a sci-fi TV show. But calling it a “Complex Stack Access Mechanism” or “Data Control Juggling System” seems too long, so we’re stuck with “closures.” Don’t get me started on whether anyone needs to pay for the nonfree variables.
Too big data
When RAM starts filling up, everything starts going wrong. It doesn’t matter if you’re performing newfangled statistical analysis of consumer data or working on a boring, old spreadsheet. When the machine runs out of RAM, it turns to so-called virtual memory that spills out into the superslow hard disk. It’s better than crashing completely or ending the job, but boy does everything slow down.
The problem is that hard disks are at least 20 or 30 times slower than RAM and the mass-market disk drives are often slower. If some other process is also trying to write or read from the disk, everything becomes dramatically worse because the drives can do only one thing at a time.
Activating the virtual memory exacerbates other, hidden problems with your software. If there are threading glitches, they start to break much faster because the threads that are stuck out in the hard disk virtual memory run so much slower than the other threads. That only lasts a brief period, though, because the once wallflower threads get swapped into memory and the other threads hang up. If the code is perfect, the result is merely much slower. If it’s not, the flaws quickly send it crashing into disaster. That’s one small example.
Managing this is a real challenge for programmers who are working with big piles of data. Anyone who gets a bit sloppy with building wasteful data structures ends up with code that slows to a crawl in production. It may work fine with a few test cases, but real loads send it spiraling into failure.
Anyone with a university education in computer science knows of the mysterious problems wrapped in an acronym that’s rarely spelled out: nondeterministic polynomial complete, aka NP-complete. The details often take an entire semester to learn, and even then, many CS students come out with a foggy notion that no one can solve these problems because they’re too hard.
The NP-complete problems often are quite difficult—if you attack them simply with brute force. The “traveling salesman problem,” for example, can take an exponentially long time as the sales route includes more and more cities. Solving a “knapsack problem” by finding a subset of numbers that come the closest to some value N are solved by trying all possible subsets, which is a very big number. Everyone runs with fear from these problems because they’re the perfect example of one of the biggest bogeymen in Silicon Valley: algorithms that won’t scale.
The tricky part is that some NP-complete problems are easy to solve with an approximation. The algorithms don’t promise the exact solution, but they come pretty close. They may not find the perfect route for the traveling salesman, but they can come within a few percentage points of the right answer.
The existence of these pretty good solutions only make the dragons more mysterious. No one can be sure if the problems are truly hard or easy enough if you’re willing to be satisfied by an answer that’s just good enough.
“There are known knowns; there are things we know we know,” Donald Rumsfeld, the Secretary of Defense during the second Bush administration, once said at a press conference. “We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns—the ones we don’t know we don’t know.”
Rumsfeld was talking about the war in Iraq, but the same holds true for computer security. The biggest problems are holes that we don’t even know are possible. Everyone understands that you should make your password hard to guess—that’s a known known. But who has ever been told that your networking hardware has its own software layer buried inside? The possibility that someone could skip hacking your OS and instead target this secret layer is an unknown unknown.
The possibility of that kind of hack may not be unknown to you now, but what if there are others? We have no clue if we can harden the holes we don’t even know exist. You can batten down the passwords, but there are cracks you can’t even imagine. That’s the fun of working with computer security. And when it comes to programming, security-minded thinking is becoming ever more important. You can’t leave it to the security pros to clean up your mess.
Encryption sounds powerful and impenetrable when law enforcement officials get in front of Congress and ask for official loopholes to stop it. The problem is that most encryption is built on a foggy cloud of uncertainty. What mathematical proofs we have rest on uncertain assumptions, like it’s hard to factor really big numbers or compute a discrete log.
Are those problems truly hard? No one has publicly described any algorithms for breaking them, but that doesn’t mean the solutions don’t exist. If you found a way to eavesdrop on every conversation and break into any bank would you promptly tell the world and help them plug the holes? Or would you remain silent?
The real challenge is using encryption in our own code. Even if we trust that the basic algorithms are secure, there’s much work to be done juggling passwords, keys, and connections. If you make one mistake and leave a password unprotected, everything falls open.
Everyone loves that New Yorker cartoon with the punchline, “On the internet, nobody knows you’re a dog.” It even has its own Wikipedia page with four elaborate sections. (On the internet, nobody knows the old saw about analyzing humor and dissecting frogs.)
The good news is that anonymity can be liberating and useful. The bad news is that we have no clue how to do anything but anonymous communications. Some programmers talk about “two-factor authentication,” but the smart ones jump to “N-factor authentication.”
After the password and maybe a text message to a cellphone, we don’t have much that’s very stable. Fingerprint readers look impressive, but plenty of people seem willing to divulge how they can be hacked (see here, here, and here for starters).
Not much of this matters to the world of idle chatter on Snapchat or Reddit, but the stream of hacked Facebook pages are a bit disconcerting. There’s no easy way to handle serious matters like property, money, health care, or pretty much everything else in life except meaningless small talk. The bitcoin fanbois love to prattle on about how rock-solid the blockchain may be, but somehow the coins keep getting ripped off (see here and here). We have no real method to handle identity.
Of course, when it comes to programming, is there even a way by which we can measure the difficulty of a problem? No one really knows. We do know that some problems are easy to solve, but it’s entirely different to certify one as hard. NP-completeness is only one part of an elaborate attempt to codify the complexity of algorithms and data analysis. The theory is helpful, but it can’t offer any guarantees. It’s tempting to say that it’s hard to even know whether a problem is hard, but well, you get the joke.