Agenda

7:30am - 5pm Registration Open
7:30am - 8:50am Breakfast
7:30am - 8:50am Intro to HBase Session
Click anywhere inside this box to close
HBase: Just the Basics
New to HBase? This session will cover the basics of HBase in a very straightforward way—including architecture, API, and schema design.

Jesse Anderson - Smoking Hand
Jesse is a Creative Engineer with years of experience in creating products and helping companies improve their software engineering. He strives to provide developers with the resources to learn new technologies and improve their skillsets. To help the local community, he volunteers time as the President of the Northern Nevada Software Developers Group.

Apache HBase: Just the Basics
9am-10:30am General Session
Click anywhere inside this box to close
Welcome Messages/State of HBase
With HBase hitting the 1.0 mark and adoption/production use cases continuing to grow, it's been an exciting year since last we met at HBaseCon 2014. What is the state of HBase today, and where does it go from here?

Andrew Purtell - Salesforce.com
Andrew is a committer and PMC member for HBase, and is an Architect at Salesforce.com working on cloud storage. Previously, Andrew worked at Intel, Trend Micro, Sparta, and McAfee.

Enis Söztutar - Hortonworks
Enis is a Member of the Technical Staff at Hortonworks, an Apache HBase, Apache Hadoop, and Apache Gora committer, and a member of the Apache Software Foundation. He has been using and developing Hadoop ecosystem projects since 2007.

Michael Stack - Cloudera
Michael is a Software Engineer at Cloudera. He is the VP/PMC Chair for Apache HBase, as well as a PMC Member of Apache Hadoop.


Welcome Messages/State of HBase
Click anywhere inside this box to close
Bigtable and HBase: Storing the World's Data
HBase is based on Bigtable, designed over a decade ago at Google. This architecture is the choice for mind-bendingly large datasets both inside and outside of Google because it scales structured data unlike anything else. Carter, the engineering manager for Bigtable in New York City, will talk about how Google has continued to improve Bigtable to meet ever-larger datasets and more demanding performance requirements.

Carter Page - Google
Carter Page is an engineer and manager on the Bigtable development team at Google in New York City. For the last 19 years, Carter has worked on high-performance distributed software across several industries, including media, finance, and education.

Bigtable and HBase: Storing the World's Data
Click anywhere inside this box to close
Zen: A Graph Data Model on HBase
Zen is a storage service built at Pinterest that offers a graph data model of top of HBase and potentially other storage backends. In this talk, RVP and Xun will go over the design motivation for Zen and describe its internals including the API, type system, and HBase backend.

Raghavendra Prabhu - Pinterest
Raghavendra Prabhu (aka RVP) manages the infrastructure team at Pinterest, which is responsible for core backend infrastructure including storage systems, caching, service framework, and core business logic. Prior to Pinterest, RVP worked for many years on storage and search infrastructure at Twitter, Google, and Microsoft.

Xun Liu - Pinterest
Xun is a software engineer on the infrastructure team at Pinterest. He worked in many areas and is currently focusing on storage and caching solutions. Before Pinterest, Xun was a staff software engineer at Google and worked on display ads and search quality.

Zen: A Graph Data Model on HBase
Click anywhere inside this box to close
The Evolution of HBase @ Bloomberg
Learn the evolution and consolidation of Bloomberg's core infrastructure around fewer, faster, and simpler systems, and the role HBase plays within that effort. You'll also hear about HBase modifications to accommodate the "medium data" use case and get a preview of what's to come.

Matthew Hunt - Bloomberg
Matthew works on systems architecture for Bloomberg and its Portfolio Analytics product, which comprises real time and historical analytics for returns, risk, optimization, and attribution. He has been lucky enough to have been the CTO at several startups, and served as the president of LUNY!, the Linux Users of New York.

Sudarshan Kadambi - Bloomberg
Sudarshan is an Architect at Bloomberg helping evolve Bloomberg's Data and Compute infrastructure. He has a background in distributed systems from his days at Stanford and Yahoo!. He has been a user of Hadoop since 2008 and is passionate about making it awesome.

The Evolution of HBase @ Bloomberg
10:40am - 11am Break
Operations
Development & Internals
Ecosystem
Use Cases
11am - 11:40am
Click anywhere inside this box to close
HBase Operations in a Flurry
With multiple clusters of 1,000+ nodes replicated across multiple data centers, Flurry has learned many operational lessons over the years. In this talk, you'll explore the challenges of maintaining and scaling Flurry's cluster, how we monitor, and how we diagnose and address potential problems.

Rahul Gidwani - Yahoo!
Rahul is an engineer on the platform team at Flurry/Yahoo!. For the past few years, he has been working with HBase and scaling Flurry's cluster.


Ian Friedman - Yahoo!
Ian is a Senior Software Engineer on Flurry's Platform Team. He works primarily on Flurry's data ingestion and metrics aggregation pipeline, which continuously processes over 20 TB of mobile analytics event data per day. He also helps manage and troubleshoot Flurry's 2,000+ node Hadoop/HBase cluster.

HBase Operations in a Flurry
Click anywhere inside this box to close
Meet HBase 1.0
HBase 1.0 is the new stable major release, and the start of "semantic versioned" releases. We will cover new features, changes in behavior and requirements, source/binary and wire compatibility details, and upgrading. We'll also dive deep into the new standardized client API in 1.0, which establishes a separation of concerns, encapsulates what is needed from how it's delivered, and guarantees future compatibility while freeing the implementation to evolve.

Enis Söztutar - Hortonworks
Enis is a Member of the Technical Staff at Hortonworks, an Apache HBase, Apache Hadoop, and Apache Gora committer, and a member of the Apache Software Foundation. He has been using and developing Hadoop ecosystem projects since 2007.

Solomon Duskis - Google
Solomon has been working on HBase since October 2014 and has focused on efforts relating to the HBase 1.0 client standardization efforts. He works for the Google Bigtable team.

Meet HBase 1.0
Click anywhere inside this box to close
HBase as an IoT Stream Analytics Platform for Parkinson's
Disease Research
In this session, you will learn about a solution developed in partnership between Intel and the Michael J. Fox foundation to enable breakthroughs in Parkinson's disease (PD) research, by leveraging wearable sensors and smartphone to monitor PD patient's motor movements 24/7. We'll elaborate on how we're using HBase for time-series data storage and integrating it with various stream, batch, and interactive technologies. We'll also review our efforts to create an interactive querying solution over HBase.

Ido Karavany - Intel
Ido is a Big Data Analytics Architect and Development Manager in Intel's Advanced Analytics group. He is responsible for leading-edge technology projects within Intel involving Big Data and stream analytics solutions in the Internet of Things and Parkinson's disease research.

HBase as an IoT Stream Analytics Platform for Parkinson's Disease Research
Click anywhere inside this box to close
HBase @ Flipboard
Flipboard services over 100 million users using heterogenous results including user generated content, interest profile, algorithmically generated content, social firehose, friends graph, ads, and web/rss crawlers. To personalize and serve these results in real time, Flipboard employs a variety of data models, access patterns and configuration. This talk will present how some of these strategies are implemented using HBase.

Sang Chi - Flipboard
Sang is currently running the data and search platform at Flipboard, focused on running scalable data solutions/infrastructure for analytics, products and search. He has used HBase since 2010 across user graph, magazine storage, metrics, personalized feed, feature extraction, ranking, ads, and more.

Jason Culverhouse - Flipboard
Jason Culverhouse is a Software Engineer at Flipboard.

Matt Blair - Flipboard
Matt is a Software Engineer at Flipboard and has worked on infrastructure there since 2011, with a recent focus on building backend services that leverage its data pipeline and various distributed databases, including HBase.

HBase @ Flipboard
11:50am - 12:30pm
Click anywhere inside this box to close
HBase at Scale in an Online and High-Demand Environment
Pinterest runs 38 different HBase clusters in production, doing a lot of different types of work—with some doing up to 5 million operations per second. In this talk, you'll get details about how we do capacity planning, maintenance tasks such as online automated rolling compaction, configuration management, and monitoring.

Jeremy Carroll - Pinterest
Jeremy is one of the foundational members of the Site Reliability Engineering team at Pinterest. He helps design, build, and monitor Pinterest's applications and systems infrastructure that currently handles billions of monthly page views with tremendous growth and scalability challenges.

HBase at Scale in an Online and High-Demand Environment
Click anywhere inside this box to close
HBase 2.0 and Beyond: Panel
Now that you've seen Base 1.0, what's ahead in HBase 2.0, and beyond—and why? Find out from this panel of people who have designed and/or are working on 2.0 features.

Matteo Bertozzi - Cloudera
Matteo Bertozzi is a Software Engineer at Cloudera, and an HBase committer/PMC member.

Sean Busbey - Cloudera
Sean is a committer on the Apache HBase project. He is also a PMC member on Apache Accumulo. He currently works at Cloudera as a Software Engineer on the storage team.


Jingcheng Du - Intel
Jingcheng works for Intel Big Data Team as a Senior Software Engineer. He has worked on developing HBase features since 2012 and is also an HBase contributor.

Lars Hofhansl - Salesforce.com
Lars is an Apache HBase Committer and PMC member. He is an Architect at Salesforce.com, where he leads HBase development efforts, recently forcusing on performance, backup, and disaster recovery. In the past, Lars held engineering roles at Peoplesoft and Digital Equipment Corp.

Jonathan Hsieh - Cloudera
Jonathan is a Software Engineer with Cloudera, currently focused on the HBase project. He is an HBase committer and PMC member, a committer and founder of the Apache Flume project, and a committer on the Apache Sqoop project.

Enis Söztutar - Hortonworks
Enis is a Member of the Technical Staff at Hortonworks, an Apache HBase, Apache Hadoop, and Apache Gora committer, and a member of the Apache Software Foundation. He has been using and developing Hadoop ecosystem projects since 2007.

Jimmy Xiang - Cloudera
Jimmy is a Software Engineer at Cloudera, and an HBase committer/PMC member.

HBase 2.0 and Beyond: Panel
Click anywhere inside this box to close
Apache Phoenix: The Evolution of a Relational Database
Layer over HBase
Phoenix has evolved to become a full-fledged relational database layer over HBase data. We'll discuss the fundamental principles of how Phoenix pushes the computation to the server and why this leads to performance enabling direct support of low-latency applications, along with some major new features. Next, we'll outline our approach for transaction support in Phoenix, a work in-progress, and discuss the pros and cons of the various approaches. Lastly, we'll examine the current means of integrating Phoenix with the rest of the Hadoop ecosystem.

James Taylor - Salesforce.com
James is an architect at Salesforce.com in the Big Data Group. He founded the Apache Phoenix project and leads the development effort. Prior to working at Salesforce.com, James worked at BEA Systems on projects such as a federated query processing system and an event driven programming platform and has worked at various other start-ups in the computer industry over the past 20+ years.

Maryann Xue - Intel
Maryann is a software engineer on the Big Data Technologies team at Intel and a PMC member of the Phoenix project.


Apache Phoenix: The Evolution of a Relational Database Layer over HBase
Click anywhere inside this box to close
Graph Processing of Stock Market Order Flow in HBase on AWS
In this session, we will briefly cover the FINRA use case and then dive into our approach with a particular focus on how we leverage HBase on AWS. Among the topics covered will be our use of HBase Bulk Loading and ExportSnapShots for backup. We will also cover some lessons learned and experiences of running a persistent HBase cluster on AWS.

Aaron Carreras - FINRA
Aaron is director of enterprise data at FINRA. His team works on all aspects of data coming in and out of FINRA. For the past two years, his focus has primarily been on design, development, and rollout of FINRA's first two HBase-backed applications in the cloud. He has spent his entire career working on applications and data in the Finance space.

Graph Processing of Stock Market Order Flow in HBase on AWS
12:30pm - 1:30pm Lunch
1:30pm - 2:10pm
Click anywhere inside this box to close
OpenTSDB and AsyncHBase Update
OpenTSDB continues to scale along with HBase. A number of updates have been implemented to push writes over 2 million data points a second. Here we will discuss about HBase schema improvements, including salting, random UI assignment, and using append operations instead of puts. You'll also get AsyncHBase development updates about rate limiting, statistics, and security.

Chris Larsen - Yahoo!
Chris is a software engineer at Yahoo! working on the monitoring team to store and process time-series data at a massive scale. He coordinates development on OpenTSDB and AsyncHBase with a great community of users and contributors. Previously, he helped publish OpenTSDB 2.0 while working at Limelight Networks.

Benoît Sigoure - Arista Networks
Benoît is the creator of OpenTSDB and AsyncHBase - although OpenTSDB is now largely maintained by Chris Larsen, since the 2.0 release. Benoît currently works on new distributed systems at Arista Networks, where HBase plays a central role.

OpenTSDB and AsyncHBase Update
Click anywhere inside this box to close
HBase Performance Tuning
At Salesforce, we have deployed many thousands of HBase/HDFS servers, and learned a lot about tuning during this process. This talk will walk you through the many relevant HBase, HDFS, Apache ZooKeeper, Java/GC, and Operating System configuration options and provides guidelines about which options to use in what situation, and how they relate to each other.

Lars Hofhansl - Salesforce.com
Lars is an Apache HBase Committer and PMC member. He is an Architect at Salesforce.com, where he leads HBase development efforts, recently forcusing on performance, backup, and disaster recovery. In the past, Lars held engineering roles at Peoplesoft and Digital Equipment Corp.

HBase Performance Tuning
Click anywhere inside this box to close
Analyzing HBase Data with Apache Hive
Hive/HBase integration extends the familiar analytical tooling of Hive to cover online data stored in HBase. This talk will walk through the architecture for Hive/HBase integration with a focus on some of the latest improvements such as Hive over HBase Snapshots, HBase filter pushdown, and composite keys. These changes improve the performance of the Hive/HBase integration and expose HBase to a larger audience of business users. We'll also discuss our future plans for HBase integration.

Swarnim Kulkarni - Cerner
Swarnim is a Lead Architect with the Big Data team at Cerner Corporation. At Cerner, his team is focused on designing and development of infrastructure for ingestion of healthcare data in the cloud using Apache Hadoop technologies. He is also a contributor to Apache Hive.

Brock Noland - Cloudera
Brock is an engineer at StreamSets, an Apache Flume, Hive, Crunch, MRUnit, and Parquet (incubating) PMC member, and a mentor to Apache Nifi (incubating). Prior to StreamSets, he was an engineering manager at Cloudera.


Nick Dimiduk - Hortonworks
Nick is an Apache HBase commiter, PMC member, and a co-author of HBase in Action. He works on the HBase team at Hortonworks, where his focus is on usability and performance.

Analyzing HBase Data with Apache Hive
 
Click anywhere inside this box to close
Apache Kylin: Extreme OLAP Engine for Hadoop
Kylin is an open source distributed analytics engine contributed by eBay that provides a SQL interface and OLAP on Hadoop supporting extremely large datasets. Kylin's pre-built MOLAP cubes (stored in HBase), distributed architecture, and high concurrency helps users analyze multidimensional queries via SQL and other BI tools. During this session, you'll learn how Kylin uses HBase's key-value store to serve SQL queries with relational schema.

Luke Han - eBay
Luke Han joined eBay in late 2011 as a staff BI architect on the BI Platform Team. Luke now serves as a product owner of the Kylin project and works closely with customers and partners to on-board their cases to Kylin. Prior to eBay, Luke was a Senior Consultant at Actuate China.

Yang Li - eBay
Yang Li joined eBay, Shanghai in Jan 2014 as a Member of the Technical Staff and is a key developer and architect of the Kylin OLAP Engine. He also leads the Kylin engineering team in Shanghai. Prior to eBay, Yang spent 8 years with IBM and 2 years with Morgan Stanley.
Apache Kylin: Extreme OLAP Engine for Hadoop
Click anywhere inside this box to close
Running ML Infrastructure on HBase
Sift Science uses online, large-scale machine learning to detect fraud for thousands of sites and hundreds of millions of users in real-time. This talk describes how we leverage HBase to power an ML infrastructure including how we train and build models, store and update model parameters online, and provide real-time predictions. The central pieces of the machine learning infrastructure and the tradeoffs we made to maximize performance will also be covered.

Andrey Gusev - Sift Science
Andrey is ML infrastructure tech lead at Sift Science and enjoys machine learning, search, NLP, and distributed systems. Before Sift, Andrey was a lead engineer at Salesforce.com working on search and machine learning systems.

Running ML Infrastructure on HBase
2:20pm - 3pm
Click anywhere inside this box to close
Multitenancy in HBase: Learnings from Yahoo!
Since 2013, thanks to a combination of deployment and HBase enhancements, Yahoo! has successfully supported a diverse set of tenants in a single HBase cluster. Here you'll learn how this approach makes it feasible to support small and large tenants cost-effectively, minimizes operational overhead, and provides a lot of flexibility.

Francis Liu - Yahoo!
Francis is a Principal Software Engineer at Yahoo! working mainly on Apache HBase. He is also an Apache Hive contributor. Prior to that, he was involved in the development of a workflow management and incremental processing platform built on top of Apache Hadoop.

Vandana Ayyalasomayajula - Yahoo!
Vandana is a software engineer at Yahoo!. She currently works for the HBase team building various multitenancy features for HBase. She is a contributor to HBase and was also a PMC member and Committer for Apache HCatalog. Prior to Yahoo!, Vandana was a graduate student at UC Irvine.

Virag Kothari - Yahoo!
Virag works for the HBase team at Yahoo!, where his current focus is on challenges related to scalability and multitenancy. He is an HBase committer and a committer/PMC member for Apache Oozie.

Multitenancy in HBase: Learnings from Yahoo!
Click anywhere inside this box to close
Integrating HBase with Drools Rule Engine for Distributed
Rule Processing
This session explains how the popular open source Drools rule engine can be integrated with HBase, and how to write Drools rules such that HBase client requests are intercepted and the request is classified accordingly if certain rules are fired. As you'll learn, this approach can be potentially useful for real-time recommendation or classification-type problems.

Dibyendu Bhattacharya - Pearson
Dibyendu is a Big Data Architect at Pearson, building a next-generation learning platform that will capture and analyze behavioral data across Pearson online learning applications in near real-time. He has experience in building enterprise applications and products leveraging SOA, cloud computing, distributed computing, and Big Data technologies.

Integrating HBase with Drools Rule Engine for Distributed Rule Processing
 
Click anywhere inside this box to close
HBase and Spark
In this session, learn how to build an Apache Spark or Spark Streaming application that can interact with HBase. In addition, you'll walk through how to implement common, real-world batch design patterns to optimize for performance and scale.

Ted Malaska - Cloudera
Ted is a Solutions Architect at Cloudera. He has 18 years of professional experience working for startups, the U.S. Federal Government, some of the world's largest banks, and the U.S.'s largest non-profit financial regulator. Ted is a regular contributor to Apache Flume, Apache Avro, Apache Pig, and YARN.

HBase and Spark
Click anywhere inside this box to close
Optimizing HBase for the Cloud in Microsoft Azure HDInsight
Microsuft Azure's Hadoop cloud service, HDInsight, offers Hadoop, Storm, and HBase as fully managed clusters. In this talk, you'll explore the architecture of HBase clusters in Azure, which is optimized for the cloud, and a set of unique challenges and advantages that come with that architecture. We'll also talk about common patterns and use cases utilizing HBase on Azure.

Maxim Lukiyanov - Microsoft
Maxim is a program manager on the Big Data team at Microsoft. He is responsible for the Apache HBase cluster type in Azure HDInsight, focusing primarily on optimizing HBase for cloud environment.

Optimizing HBase for the Cloud in Microsoft Azure HDInsight
Click anywhere inside this box to close
Industrial Internet Case Study using HBase and TSDB
This case study involves analysis of high-volume, continuous time-series aviation data from jet engines that consist of temperature, pressure, vibration and related parameters from the on-board sensors, joined with well-characterized slowly changing engine asset configuration data and other enterprise data for continuous engine diagnostics and analytics. This data is ingested via distributed fabric comprising transient containers, message queues and a columnar, compressed storage leveraging OpenTSDB over Apache HBase.

Shyam Varan Nath - GE
Shyam is a Big Data & Analytics Architect working at GE. His primary focus is Industrial Internet related solutions for aviation. Prior to GE, Shyam worked for IBM, Oracle, and Deloitte. He has over 23 years of industry experience in areas like data warehousing and advanced analytics.

Arnab Guin - GE
Arnab is a Staff Software Engineer, Big Data with General Electric's Predix Big Data Platforms group. His work focuses on developing and designing platforms encompassing high-speed ingestion, storage, and analytics. Prior to GE, Arnab worked on distributed genome sequencing algorithms at Complete Genomics and developed high-speed data pipelines at Tivo for high-volume viewership data.

Industrial Internet Case Study using HBase and TSDB
3pm - 3:20pm Break
3:20pm - 4pm
Click anywhere inside this box to close
Smooth Operators: Panel
Panelists from Pinterest, Facebook, Google, Bloomberg, Flipboard, and Dropbox spill their secrets about HBase ops and answer your questions.

Clay Baenziger - Bloomberg
Clay leads the Hadoop Infrastructure team at Bloomberg. Clay comes from a diverse background in systems infrastructure and analytics. At Sun Microsystems, his team built out an automated bare-metal Solaris deployment tool for Solaris engineering labs and his contributions were core to the OpenSolaris Automated Installer.

Jeremy Carroll - Pinterest
Jeremy is one of the foundational members of the Site Reliability Engineering team at Pinterest. He helps design, build, and monitor Pinterest's applications and systems infrastructure that currently handles billions of monthly page views with tremendous growth and scalability challenges.

Elliott Clark - Facebook
Elliott is an engineer at Facebook on the Apache HBase team. He's also an HBase PMC member and committer.

Dave Coyle - Dropbox
Dave was the first Hadoop SRE at Dropbox, which has a small team focusing on HDFS and HBase operations and reliability. Prior to that, he worked on Hadoop and other systems at Spotify, Morgan Stanley, and other companies.

Max Luebbe - Google
Max is a Site Reliability Engineer at Google's New York City office. In this role he is responsible for running a handful of services you probably use every day, specifically with regards to their availability and reliability. Prior to working at Google, he cofounded Pip.io, a social web startup in Palo Alto, CA.

Joey Parsons - Flipboard
Joey is on the operations team at Flipboard.

Smooth Operators: Panel
Click anywhere inside this box to close
Events @ Box: Using HBase as a Message Queue
Box's /events API powers our desktop sync experience and provides users with a realtime, guaranteed-delivery event stream. To do that, we use HBase to store and serve a separate message queue for each of 30+ million users. Learn how we implemented queue semantics, were able to replicate our queues between clusters to enable transparent client failover, and why we chose to build a queueing system on top of HBase.

David Mackenzie - Box
David is a Staff Software Engineer at Box, where he's spent the past three years working on the infrastructure powering the company's desktop sync experience. He's currently building out Box's new HBase-backed guaranteed-delivery messaging infrastructure. Prior to Box, David worked at a small mobile telecom company building 3G network switches.

Events @ Box: Using HBase as a Message Queue
 
Click anywhere inside this box to close
State of HBase Docs and How to Contribute
In this session, learn about the move to Asciidoc in HBase docs, some of the other notable changes lately, and things we've done to make it easier for you to contribute to the docs.

Misty Stanley-Jones - Cloudera
Misty is a senior technical writer at Cloudera, working on documentation for Apache HBase, CDH, and other storage-related projects. She has been heavily involved in Linux and open source since 1996. In past lives, she managed the middleware technical writing staff at Red Hat, wore a sysadmin hat for several years, and hacked the Solaris kernel for a while.
State of HBase Docs and How to Contribute
Click anywhere inside this box to close
Warcbase: Scaling 'Out' and 'Down' HBase for Web Archiving
Web archiving initiatives around the world capture ephemeral web content to preserve our collective digital memory. However, that requires scalable, responsive tools that support exploration and discovery of captured content. Here you'll learn about why Warcbase, an open-source platform for managing web archives built on HBase, is one such tool. It provides a flexible data model for storing and managing raw content as well as metadata and extracted knowledge, tightly integrates with Hadoop for analytics and data processing, and relies on HBase for storage infrastructure.

Jimmy Lin - University of Maryland
Jimmy is an Associate Professor at the University of Maryland. From 2010-2012, he spent an extended sabbatical at Twitter working on analytics infrastructure and various data products.

Warcbase: Scaling 'Out' and 'Down' HBase for Web Archiving
Click anywhere inside this box to close
S2Graph: A Large-scale Graph Database with HBase
As the operator of the dominant messenger application in South Korea, KakaoTalk has more than 170 million users, and our ever-growing graph has more than 10B edges and 200M vertices. This scale presents several technical challenges for storing and querying the graph data, but we have resolved them by creating a new distributed graph database with HBase. Here you'll learn the methodology and architecture we used to solve the problems, compare it another famous graph database, Titan, and explore the HBase issues we encountered.

Doyung Yoon - DaumKakao
Doyung started his career at Google as a Software Engineer, and has worked for years on search engine and data mining. These days, he's fascinated by large-scale distributed systems.

Taejin Chin - DaumKakao
Taejin is a Software Engineer at DaumKakao. He built a distributed graph database for social graph data. He is also interested in graph theory, applied algorithms, and problem solving.

S2Graph: A Large-scale Graph Database with HBase
4:10pm - 4:50pm
Click anywhere inside this box to close
HBase Operations at Xiaomi
In this session, you will learn the work Xiaomi has done to improve the availability and stability of our HBase clusters, including cross-site data and service backup and a coordinated compaction framework. You'll also learn about the Themis framework, which supports cross-row transactions on HBase based on Google's percolator algorithm, and its usage in Xiaomi's applications.

Shaohui Liu - Xiaomi
Shaohui is interested in distributed computing and storage systems. Currently, he focuses on the application and operation of HBase at Xiaomi. Prior to Xiaomi, he worked on an in-house MapReduce implementation and cluster management system at Tencent.

Jianwei Cui - Yahoo!
Jianwei is a software engineer at Xiaomi in China. His work focuses on the development and improvement on Apache HBase.

HBase Operations at Xiaomi
Click anywhere inside this box to close
Reusable Data Access Patterns with CDAP Datasets
In this talk, you'll learn about Datasets, part of the open source Cask Data Application Platform (CDAP), which provide reusable implementations of common data access patterns. We will also look at how Datasets provide a set of common services that extend the capabilities of HBase: global transactions for multi-row or multi-table updates, read-less increments for write-optimized counters, and support for combined batch and real-time access.

Gary Helmling - Cask
Gary is a Committer and PMC member for the Apache HBase project. He works on HBase and Apache Hadoop development at Cask (formerly Continuuity), and has contributed to security, coprocessors, and the RPC stack. In past roles, Gary has worked at Twitter, Trend Micro, and Meetup.

Reusable Data Access Patterns with CDAP Datasets
Click anywhere inside this box to close
Trafodion: Integrating Operational SQL into HBase
Trafodion, open sourced by HP, reflects 20+ years of investment in a full-fledged RDBMS built on Tandem's OLTP heritage and geared towards a wide set of mixed query workloads. In this talk, we will discuss how HP integrated Trafodion with HBase to take full advantage of the Trafodion database engine and the HBase storage engine, covering 3-tier architecture, storage, salting/partitioning, data movement, and more.

Anoop Sharma - Hewlett-Packard
Anoop is the lead Architect for the Trafodion program. He has worked in the areas of database technologies for many years at HP and has led design, development, and performance improvements for multiple database products.

Rohit Jain - Hewlett-Packard
Rohit, Database Distinguished & Chief Technologist at HP, leads an effort to build Big Data Apache Hadoop solutions while leveraging Apache HBase. He has also served as a solutions architect and a database consultant, developer, architect, development and QA manager, and product manager.

Trafodion: Integrating Operational SQL into HBase
Click anywhere inside this box to close
HBase @ CyberAgent
CyberAgent is a leading Internet company in Japan focused on smartphone social communities and a game platform known as Ameba, which has 40M users. In this presentation, we will introduce how we use HBase for storing social graph data and as a basis for ad systems, user monitoring, log analysis, and recommendation systems.

Toshishiro Suzuki - CyberAgent
Toshishiro joined CyberAgent in 2008. He is in charge of a log analysis system using Apache Hadoop and Apache Hive and a graph database built on Apache HBase. He is co-author of Beginner's Guide to HBase (Japanese language), which was released through Shoeisha in 2015.

Hirotaka Kakishima - CyberAgent
Hirotaka is a database engineer at CyberAgent. He has administrated HBase clusters for 2 years. He is a co-author of Beginner's Guide to HBase (Japanese language), which was released through Shoeisha in 2015.
HBase @ CyberAgent
 
Click anywhere inside this box to close
Blackbird Collections: In-situ Stream Processing in HBase
Blackbird is a large-scale object store built at Rocket Fuel, which stores 100+ TB of data and provides real time access to 10 billion+ objects in a 2-3 milliseconds at a rate of 1 million+ times per second. In this talk (an update from HBaseCon 2014), we will describe Blackbird's comprehensive collections API and various examples of how it can be used to model collections like sets, maps, and aggregates on these collections like counters, etc. We will also illustrate the flexibility and power of the API by modeling custom collection types that are unique to the Rocket Fuel context.

Ishan Chhabra - Rocket Fuel
Ishan is a Technical Lead at Rocket Fuel, with a focus on building the next generation of real-time storage and processing systems. Hadoop, HBase, Storm and Clojure are his tools of choice for tackling complexity and scalability challenges of storing and analyzing petabytes of data generated and stored at Rocket Fuel. Prior to Rocket Fuel, he worked at Bell Labs to enable privacy in large-scale recommendation systems using distributed middleware, acquiring a patent in the process.

Nitin Aggarwal - Rocket Fuel
Nitin is a Software Engineer at Rocket Fuel where he builds data applications using Apache HBase, MapReduce, YARN, and Apache Storm to enable faster access and easier analysis of petabytes of data. He has also contributed to developing scalable monitoring and alerting infrastructure for the company using HBase and OpenTSDB.

Venkata Deepankar Duvvuru - Rocket Fuel
Venkata is a Software Engineer at Rocket Fuel where he builds large-scale data and serving applications using Apache HBase, MapReduce, Thrift, and Clojure. In the past year, he has worked on JVM tuning for providing better guarantees in latencies while serving. Prior to Rocket Fuel, he interned at Google and INRIA.

Blackbird Collections: In-situ Stream Processing in HBase
5pm - 5:40pm
Click anywhere inside this box to close
Elastic HBase on Mesos
Adobe has packaged HBase in Docker containers and uses Marathon and Mesos to schedule them—allowing us to decouple the RegionServer from the host, express resource requirements declaratively, and open the door for unassisted real-time deployments, elastic (up and down) real-time scalability, and more. In this talk, you'll hear what we've learned and explain why this approach could fundamentally change HBase operations.

Cosmin Lehene - Adobe
Cosmin is a senior computer scientist in Adobe's Analytics Platform team, working on distributed infrastructure for the Adobe Marketing Cloud. His past work includes a real-time distributed OLAP cube on top of HBase, a real-time video QoS analytics service, and Adobe Analytics Video Heartbeats.

Elastic HBase on Mesos
 
Click anywhere inside this box to close
DeathStar: Easy, Dynamic, Multi-tenant HBase via YARN
In this talk, you'll learn how Rocket Fuel has developed various HBase access patterns and multi-tenancy scenarios and the role of DeathStar, an in-house solution built on top of Apache Slider and YARN. We'll cover how we use a single YARN cluster to host multiple smaller and highly customized HBase clusters, and how dynamic provisioning and elastic scaling are made possible in this model.

Nitin Aggarwal - Rocket Fuel
Nitin is a Software Engineer at Rocket Fuel where he builds data applications using Apache HBase, MapReduce, YARN, and Apache Storm to enable faster access and easier analysis of petabytes of data. He has also contributed to developing scalable monitoring and alerting infrastructure for the company using HBase and OpenTSDB.

Ishan Chhabra - Rocket Fuel
Ishan is a Technical Lead at Rocket Fuel, with a focus on building the next generation of real-time storage and processing systems. Hadoop, HBase, Storm and Clojure are his tools of choice for tackling complexity and scalability challenges of storing and analyzing petabytes of data generated and stored at Rocket Fuel. Prior to Rocket Fuel, he worked at Bell Labs to enable privacy in large-scale recommendation systems using distributed middleware, acquiring a patent in the process.

DeathStar: Easy, Dynamic, Multi-tenant HBase via YARN
Click anywhere inside this box to close
Taming GC Pauses for Large Java Heap in HBase
In this presentation, we will introduce Hotspot's Garbage First collector (G1GC) as the most suitable collector for latency-sensitive applications running with large memory environments. We will first discuss G1GC internal operations and tuning opportunities, and also cover tuning flags that set desired GC pause targets, change adaptive GC thresholds, and adjust GC activities at runtime. We will provide several HBase case studies using Java heaps as large as 100GB that show how to best tune applications to remove unpredicted, protracted GC pauses.

Eric Kaczmarek - Intel
Eric is a Senior Java Performance Architect in the Software Solution Group at Intel. For the better part of the last 10 years, he focused on optimizing the Java Virtual Machine for Intel Architectures. Because of his deep and broad Java Virtual Machine expertise, Eric leads the effort to enable and optimize Big Data frameworks such as Apache Hadoop and Apache HBase for Intel-based platforms.

Liqi Yi - Intel
Liqi is a senior Java Performance engineer at Intel's Software Solution Group. He has extensive experience with HBase performance optimization, Java Garbage Collection tuning, and hardware platform characterization.

Taming GC Pauses for Large Java Heap in HBase
Click anywhere inside this box to close
SQL-on-HBase Smackdown: Panel
Nothing is hotter than SQL-on-Hadoop, and now SQL-on-HBase is fast approaching equal hotness status. In this panel, a panel of developers deeply involved in this effort will discuss the work done so far across the ecosystem and the work still to be done.

Julian Hyde - Hortonworks
Julian, an architect at Hortonworks, is an expert in database architecture, query optimization, and in-memory analytics. He is the original developer of the Apache Calcite query-planning framework, an Apache Drill committer, and lead developer of the Mondrian OLAP engine.

Rohit Jain - Hewlett-Packard
Rohit, Database Distinguished & Chief Technologist at HP, leads an effort to build Big Data Apache Hadoop solutions while leveraging Apache HBase. He has also served as a solutions architect and a database consultant, developer, architect, development and QA manager, and product manager.

Dr. Ricardo Jimenez-Peris - LeanXcale
Ricardo is CEO and cofounder of LeanXcale. He is an expert on scalable transactions, co-author of a book on scalable database replication, 100+ papers at international conferences and journals, and co-inventor of several patents. He is a member of the expert group advising the European Commission on Cloud Computing.

John Leach - Splice Machine
With over 15 years of software experience under his belt, John's expertise in analytics and BI drives his role as CTO. Prior to Splice Machine, John founded Incite Retail and led the company's strategy and development efforts. Prior to Incite Retail, he ran the business intelligence practice at Blue Martini Software and built strategic partnerships with integration partners.

James Taylor - Salesforce.com
James is an architect at Salesforce.com in the Big Data Group. He founded the Apache Phoenix project and leads the development effort. Prior to working at Salesforce.com, James worked at BEA Systems on projects such as a federated query processing system and an event driven programming platform and has worked at various other start-ups in the computer industry over the past 20+ years.

SQL-on-HBase Smackdown: Panel
Click anywhere inside this box to close
NRT Event Processing with Guaranteed Delivery of HTTP Callbacks
At Salesforce, we are building a new service, code-named Webhooks, that enables our customers' own systems to respond in near real-time to system events and customer behavioral actions from the Salesforce Marketing Cloud. The application should process millions of events per day to address the current needs and scale up to billions of events per day for future needs, so horizontal scalability is a primary concern. In this talk, we will discuss how Webhooks is built using HBase for data storage and Cask Data Application Platform (CDAP), an open source framework for building applications on Hadoop.

Alan Steckley - Salesforce.com
Alan is a Principal Software Engineer at Salesforce. He works with Hadoop and HBase to build Marketing Cloud platform services.


Poorna Chandra - Cask
Poorna is a Software Engineer at Cask where he is responsible for building software fueling the next generation of data applications. Prior to Cask, he developed Big Data infrastructure at Greenplum and Yahoo!

NRT Event Processing with Guaranteed Delivery of HTTP Callbacks
5:40pm - 8pm HBaseCon Party!
ABOUT      AGENDA      SPONSORS      SPEAKERS      ARCHIVES       CODE OF CONDUCT
©2014 HBaseCon. Cloudera, Inc. All rights reserved. Terms & Conditions. Apache HBase, HBase, Apache Hadoop, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by Cloudera.