Optimize with a SATA RAID Storage Solution
Range of capacities as low as $1250 per TB. Ideal if you currently rely on servers/disks/JBODs
Netflix is increasingly turning to Python over Java to power certain aspects of its video-streaming service, such as generating and processing alerts, boosting resilience, securing transactions, producing deployable AMIs (Amazon Machine Images), and for managing and automatic Cassandra clusters.
Roy Rapoport, monitoring engineering manager at Netflix, has revealed that Python is giving Java a run for its money among developers at Netflix, citing the language's "rich batteries-included standard library, succinct and clean yet expressive syntax, large developer community, and wealth of third-party libraries."
Find out how Python compares to Java for scalability, performance, and developer productivity. Want more Java enterprise news? Get the Enterprise Java newsletter delivered to your inbox.
Netflix has developed a RESTful Web application called CAG (Central Alert Gatway) that's capable of grabbing the hundreds of thousands of daily alerts generated by the company's telemetry system and intelligently disseminating or suppressing them on a case-by-case basis. Some alerts, for example, are automatically dispatched to the company's notification system to page on-call engineers. Some are suppressed if the proper individuals have been notified. In some cases, CAG automatically performs remediation actions, such as rebooting or terminating potentially unhealthy AWS (Amazon Web Services) EC2 instances.
The company uses a tool called Chaos Gorilla -- a cousin to its open source Chaos Monkey -- to test resiliency at a large scale. Chaos Gorilla integrates with Asgard and Edda to simulate the loss of an entire availability zone in a given region. "This sort of failure mode -- an AZ (Amazon Availability Zone) either going down or simply becoming inaccessible to other AZs -- happens once in a blue moon, but it's a big enough problem that simulating it and making sure our entire ecosystem is resilient to that failure is very important to us," wrote Rapoport.
On the security front, Netflix employs Security Monkey and Howler Monkey. The former is designed to track configuration history and to generate alerts about changes to EC2 security-related policies. The latter's purpose is to discover and track SSL certificates in Netflix's environments and domain names and to alert the proper recipients as SSL certificate expiration dates draw near. According to Rapoport, the tool has helped to eliminate instances of production outages due to SSL expirations over the past 18 months.
The company employs Chronos to handle most of its change-control process. The tool integrates with Netflix's Simian Army (the aforementioned Monkeys) and Asgard to automatically track changes, including event types like deployments, security events, and other automated actions. "Chronos accepts events via a REST interface and allows humans and machines to ask questions like, 'What happened in the last hour?' or, 'What software did we deploy in the last day?" according to Rapaport.