At Ooyala, real-time user data arrives every second. Ooyala’s algorithms consumed approximately 12 million relevant user engagement events on a hourly basis, to incrementally update the behavior-based video-to-video similarity graph. The products, such as Ooyala Discovery, build user models for nearly 200 million users and update their interest profile as we observe their viewing activity on-the-fly. My intern project was to create internal tools to manage our cloud storage, in order to support Ooyala’s mission to help customers deliver personalized videos to every screen everywhere.
Leveraging Big Data with AWS, Chef, and OpenStack
At Ooyala, we used a number of AWS services.
Amazon EC2 + Amazon VPC
AWS Direct Connect
These services provide a scalable and cost-effective way to transport, store, and access billions of log files. My internal tool supported
S3 buckets reuse (AWS supports 100 S3 buckets per account)
Cache-control headers on objects to ensure caching benefits
Secure SSL delivery with custom certificates with CloudFront
My internal tool supported a Chef cookbook to store sensitive data that chef can deliver to the clients that needed it. At Ooyala, more and more of our server and application configurations were managed with Chef. Chef is a platform which allows users to manage their servers programmatically, thus providing great power and flexibility in customizing configurations on-the-fly, depending on various attributes of each target server. Beyond writing configuration files, Chef can easily configure the operating system, install and configure applications, start and stop them in given situations, and more, using "recipes" or specialized scripts written in the Chef DSL, an extension of Ruby. The recipes are organized into "cookbooks," collections of recipes generally related to a single service. When a client server contacts its chef server, the server authenticates via the client's key and sends the client its run list, the recipes it needs to execute. Each client's specific attributes then determine how each recipe behaves on that client. Chef's power and flexibility allows all Ooyala systems, both physical and virtual, across multiple OS platforms, application roles, development environments, and geographical regions to be managed using the same set of cookbooks. Configuration becomes increasingly automated as we write or modify cookbooks to manage applications.
My internal tool also supported OpenStack, which is utilized at Ooyala to implement a private cloud to cost-effectively scale personalized video delivery.
I developed an easy-to-use interface for engineers to check the status of server-based instance data using a cronjob to retrieve server data periodically. The web tool was built using Ruby, Sinatra, and MySQL.
For data join, we exploited techniques such as bloom filters to filter out irrelevant data as early as possible. We also take advantage of the characteristics of the data itself. For example, when joining content metadata with behavior data, we make use of the fact that new video content will have limited behavior data.