If you take a look at the list of trending repositories on GitHub, you’ll see amazing code from programmers who live around the world and efforts for firms big and small. But one thing you don’t often see is work that comes from the university labs. It’s rare for the next big thing to escape from an academic computer science department and capture the attention of the world.
That’s not a knock on university research. But competing with open source projects that enjoy broad support across the industry and around the world is challenging for a handful of academics and grad students. Sure, many of the top computer science schools are well off, but that doesn’t mean the money is pouring into research. Open source programmers, on the other hand, can usually build better code faster, often because their have bosses who pay them to build something that will pay off next quarter, not next century.
Yet good computer science departments still manage to punch above -- sometimes well above -- their weight. While a good part of the research is devoted to arcane topics like the philosophical limits of computation, some of it can be tremendously useful for the world at large.
What follows are nine projects currently under development at university labs that are worth your attention. They may not be the absolute best or furthest along, but each has the potential to have a broad impact on the world of computing. Some offer shipping code, others offer mostly potential, but all offer a straightforward path for transforming our world with useful computation.
Big data is one area where academia’s focus on mathematical foundations can pay off, and one of the more prominent packages to gain attention of late is DeepDive, a tool for exploring unstructured text. While many big data projects work with well-structured information that’s already in tables, DeepDive focuses on finding correlations in raw text files and other files that aren’t organized.
The Java code runs a pipeline that pushes the raw data through a set of tools that parses natural language into streams of entities -- that is, people, places, companies, or things. Then it uses statistical algorithms to search for connections among the entities, even if they’re not explicitly spelled out. These results are then boiled down to clear inferences and inserted into an old-school database.
The results vary depending upon the style of the text, the nature of the query, and the clarity of the writing, but in good circumstances the tool can deliver better results than humans can. The developers even report that some studies have shown that DeepDive “exceeded the quality of human volunteer annotators in both precision and recall for complex scientific articles.”
Bitcoin may be many things, but it is not as anonymous as many assume. The system tracks all transactions, so it's possible to trace a single coin from the date it was born, through every owner, to its current one. ZeroCoin wants to change that. The proposed system will establish a parallel world where coins will enter and leave, erasing the trail. It promises privacy and security in one.
The system establishes a new temporary currency called a ZeroCoin that’s kept in a big, anonymous pool that doesn’t track ownership or provenance. The true owner can spend the coin by creating a zero-knowledge proof that establishes their rightful control without revealing their identity. The coin is then removed from the anonymous pool and converted back into a regular bitcoin.
“Our goal is to build a cryptocurrency where your neighbors, friends, and enemies can’t see what you bought or for how much,” ZeroCoin’s developers say.
Institution: Johns Hopkins University
Finding the best route or the optimal answer can be harder than looking for a needle in a haystack. Many problems have billions, trillions, or even quadrillions of possible solutions, and finding the best one takes plenty of computing power.
Burlap lets you define the problem as a network of nodes with vectors of features or attributes attached to it. The algorithms can search through the network using a combination of brute-force searching and statistically guided exploration. The higher level of the algorithm plans the search and deploys the best algorithms. The toolkit includes dozens of the most useful algorithms for agent-based search.
The tool is useful for data-driven worlds where the data can be mapped into a large collection of nodes or objects. The code is written in Java and includes a large assortment of debugging and profiling tools that are useful for keeping the code moving toward the optimal goal.
The smartphones may let us talk, text, and even watch cat videos, but their greatest contribution to society may be as mobile doctors, ready to track our health, day in and day out. Among the hundreds of new apps for tracking our bodies is SpiroSmart, a software program that analyzes our lungs by listening to us breathe and measuring the echoes and reverberations.
The traditional medical test called a spirometer requires people to breathe through a tiny windmill that measures the intensity. Using a microphone reduces the danger of contamination and makes it possible for people to test their breathing discretely throughout the day.
The project is one part of a collection of tools analyzing lung health. Another tool, CoughSense, will record the number and severity of “cough episodes” during a day. It replaces specialized equipment or paper logs. Another approach, WiiBreathe, watches the distortion of Wi-Fi signals in the 2.4GHz range as they pass through the body and the lungs. It can track breathing within “the accuracy of 1.54 breaths per minute when compared to a clinical respiratory chest band.” All promise to reduce the need for specialized hardware, making testing simpler and more effective for all users.
Institution: University of Washington
As digital photography becomes more common, it’s only natural that people will want to do more to their images than merely look at them. Some want to filter the colors, others want to edit the images, and still more want to use the images as input to some algorithm, perhaps for steering an autonomous car.
All of these algorithms require loops -- lots and lots of nested loops churning through the rows and columns of pixels. It turns out that being careful with the design of your algorithm by paying attention to the caching of data when structuring these loops can make a big difference in speed. If you want to convert your algorithm to run on a GPU, you’ll need to rethink all of these algorithms again.
Halide is a computer language for image processing designed to abstract away these decisions for you. It will worry about the loops and GPU conversions for you. If you write the instructions for analyzing a single pixel, it will produce fast code for churning through the entire image.
Cameras have traditionally been used to take static photos of things to save for the future. The things might be moving when the shutter snaps, but after that, they’re frozen for eternity like people on a Grecian urn. They do what your eyes do by capturing light forever.
Now that superfast cameras can capture hundreds or thousands of images per second, researchers are discovering that the cameras can do more than imitate the eyes. They can also do what our ears and skin can do by sensing sound or vibration using light alone.
The Visual Microphone project uses a series of images to detect small movements in an object. In the demonstration video, Visual Microphone watches for tiny movements that a crinkly potato chip bag creates when sound hits the bag. The vibrations may be very slight, but they’re enough for the software to recover a reasonable approximation of the sound.
The team is applying the same general idea to other problems like determining whether a building or a bridge is stable and safe. They can use a sequence of images from a windy day to look for small or not so small changes in the building. Dangerous resonant vibrations may not be large enough to be seen by a human or even felt, but the camera can flag them.
The idea is simple enough to spawn a number of other sensors. Cameras can take our pulses by tracking the flow of blood through the subtle blushing of the skin. Video rib monitors can count the breaths of an infant by watching the expansion of the chest. In these cases, the camera is not only more efficient, but safer because it doesn’t make contact and works from a distance.
Institution: Massachusetts Institute of Technology
Robots and drones are becoming more and more common in the enterprise as they move from the labs and take on crucial roles. Controlling these machines requires a good grasp of the laws of physics. Drake is a collection of packages that makes it a bit easier to write the code controlling these machines.
The code delivers a number of basic and not-so-basic models for predicting how your robot will move. You can begin rigid body models, layer in aerodynamic results, and feed it all into a dynamic control algorithm. There’s also a complement of visualization tools to debug your code and watch how it behaves.
Institution: Massachusetts Institute of Technology
Anyone who’s spent time with big data or data scientists knows that they rely, more often than not, on a language called R to chew through the numbers and deliver the kind of statistical insights that make managers happy. Whether it’s marketing, risk management, scheduling, or any of host of other jobs for keeping an enterprise running, R is tuned for the statistical analysis that prove or disprove a hypothesis.
Institution: Foundation hosted by Vienna University of Economics and Business
Now, saving the best for last, is the one thing that universities do better than anyone: teach. All of these projects are nice, but many schools are also open-sourcing and sharing their courses. They’re sharing the course materials, streaming video lectures, and even organizing the kind of study groups and grading sessions that turn a lecture or a book into a full course.
There are dozens of good courses, so it’s possible to knit together a complete degree for free (or a low cost). These two GitHub repositories are pointers to a few of the real courses out there. Drink deeply because you won’t be limited by, say, tuition.
Institution: Many universities
GitHub: https://github.com/mvillaloboz/open-source-cs-degree and https://github.com/datasciencemasters/go
This story, "9 research projects that could transform the enterprise" was originally published by InfoWorld.