Apache Phoenix
![]() | |
Developer(s) | Apache Software Foundation |
---|---|
Initial release | 22 May 2014[1] |
Stable release | |
Repository | GitHub Repository |
Written in | Java, SQL |
Operating system | Cross-platform |
Type | SQL query engine, Database middleware |
License | Apache License 2.0 |
Website | phoenix |
Apache Phoenix is an open source, massively parallel, SQL query engine that enables OLTP and operational analytics for Apache Hadoop using Apache HBase as its backing store.[3] Phoenix provides a JDBC driver that hides the intricacies of the NoSQL store enabling users to create, delete, and alter SQL tables, views, indexes, and sequences; insert and delete rows singly and in bulk; and query data through SQL.[4] Phoenix compiles queries and other statements into native HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets.[5]
Overview
[edit]Apache Phoenix enables SQL-based access to HBase data, compiling SQL queries into native HBase API calls rather than using MapReduce.[6] This approach enables low-latency queries on HBase data. Key features include:
- Standard SQL support: Supports a subset of ANSI SQL-92 with extensions[7]
- JDBC connectivity: Full JDBC driver for integration with existing tools[8]
- Schema management: DDL support for creating tables, views, and indexes[9]
- Query optimization: Cost-based optimizer with statistics collection[10]
- Secondary indexes: Global and local secondary indexes for improved query performance[11]
- Transactions: ACID transaction support through integration with Apache Tephra or Apache Omid[12]
Architecture
[edit]Phoenix acts as a SQL skin over HBase, implementing a SQL query engine on top of HBase's native APIs.[13] The architecture consists of:
- Query parser: Parses SQL queries into an abstract syntax tree[14]
- Query optimizer: Optimizes queries using table statistics and cost-based optimization[15]
- Query plan executor: Executes the optimized plan using HBase scans and filters[16]
- Metadata management: Stores table metadata in HBase system tables[17]
History
[edit]Phoenix began as an internal project by Salesforce in 2012, created by James Taylor and his team to provide SQL access to HBase for Salesforce's internal use cases.[18] It was open-sourced on GitHub on 29 January 2013.[19]
The project entered the Apache Incubator in December 2013[20] and graduated to become a top-level Apache project on 22 May 2014.[1]
Adoption
[edit]Apache Phoenix has been adopted by various organizations and is included in major Hadoop distributions:
- Cloudera: Included in Cloudera Data Platform (CDP) 7.0 and above[21]
- Hortonworks: Included in Hortonworks Data Platform (HDP) 2.1 and above[22]
- Amazon EMR: Available as an optional application[23]
- Azure HDInsight: Included in HBase clusters[24]
Features
[edit]SQL Support
[edit]Phoenix supports a subset of standard SQL including:[25]
- SELECT, UPSERT, DELETE statements
- CREATE, DROP, ALTER for tables, views, and indexes
- GROUP BY, ORDER BY, HAVING clauses
- Joins (inner, left, right, full)
- Subqueries
- Derived tables
- CASE expressions
- User-defined functions
Performance Features
[edit]Phoenix includes several performance optimizations:[26]
- Skip scan filter: Optimizes queries with leading wildcards
- Salted tables: Prevents region server hotspotting[27]
- Statistics-based query optimization: Uses table statistics for query planning
- Parallel execution: Leverages HBase coprocessors for server-side operations[28]
- Index support: Both mutable and immutable secondary indexes[29]
Integration Features
[edit]Phoenix integrates with various big data tools:[30]
- Apache Spark: Phoenix-Spark connector for DataFrame integration[31]
- Apache Pig: Phoenix Pig loader for ETL operations[32]
- Apache Flume: Phoenix Flume sink for real-time data ingestion[33]
- Apache Kafka: Phoenix-Kafka connector for streaming data[34]
- SQuirreL SQL: GUI client support through JDBC[35]
Use Cases
[edit]Common use cases for Apache Phoenix include:[36]
- Time-series data analysis
- Operational reporting on HBase data
- ETL processing with SQL
- Real-time data serving layer
- Ad-hoc querying of big data
Organizations using Phoenix include:
- Salesforce - Original creator, uses for internal analytics[37]
- Bloomberg - Financial data analytics
- eBay - Real-time analytics platform
- Huawei - Telecom data processing
Limitations
[edit]Phoenix has certain limitations compared to traditional relational databases:[38]
- No support for cross-row or cross-table transactions (except with transaction managers)
- Limited SQL feature set compared to traditional RDBMS
- Performance dependent on underlying HBase cluster
- No support for triggers or stored procedures
See also
[edit]References
[edit]- ^ a b "The Apache Software Foundation Announces Apache Phoenix as a Top-Level Project". Apache Software Foundation. 22 May 2014. Retrieved 28 May 2025.
- ^ a b "Apache Phoenix Downloads". Apache Software Foundation. Retrieved 28 May 2025.
- ^ "Apache Phoenix: OLTP and operational analytics for Apache Hadoop". Apache Software Foundation. Retrieved 28 May 2025.
- ^ "Frequently Asked Questions". Apache Phoenix. Apache Software Foundation. Retrieved 28 May 2025.
- ^ James Taylor (4 June 2014). Apache Phoenix: Transforming HBase into a SQL database. HBaseCon 2014. Cloudera. Retrieved 28 May 2025.
- ^ "Phoenix in 15 minutes or less". Apache Phoenix. Apache Software Foundation. Retrieved 28 May 2025.
- ^ "Phoenix SQL Grammar". Apache Phoenix. Apache Software Foundation. Retrieved 28 May 2025.
- ^ "Phoenix Query Server". Apache Phoenix. Apache Software Foundation. Retrieved 28 May 2025.
- ^ "Schema Information". Apache Phoenix. Apache Software Foundation. Retrieved 28 May 2025.
- ^ "Statistics Collection". Apache Phoenix. Apache Software Foundation. Retrieved 28 May 2025.
- ^ "Secondary Indexing". Apache Phoenix. Apache Software Foundation. Retrieved 28 May 2025.
- ^ "Transactions". Apache Phoenix. Apache Software Foundation. Retrieved 28 May 2025.
- ^ Dimiduk, Nick; Khurana, Amandeep (2012). HBase in Action. Manning Publications. pp. 312–315. ISBN 978-1617290527.
- ^ "How to Develop". Apache Phoenix. Apache Software Foundation. Retrieved 28 May 2025.
- ^ "Explain Plan". Apache Phoenix. Apache Software Foundation. Retrieved 28 May 2025.
- ^ "Tuning". Apache Phoenix. Apache Software Foundation. Retrieved 28 May 2025.
- ^ "Upgrade". Apache Phoenix. Apache Software Foundation. Retrieved 28 May 2025.
- ^ Abel Avram (31 January 2013). "Phoenix: Running SQL Queries on Apache HBase". InfoQ. Retrieved 28 May 2025.
- ^ James Taylor (29 January 2013). "Initial commit". GitHub. Retrieved 28 May 2025.
- ^ "Phoenix Incubation Status". Apache Software Foundation. Archived from the original on 17 May 2014. Retrieved 28 May 2025.
- ^ "Overview of Apache Phoenix". Cloudera. Retrieved 28 May 2025.
- ^ "Apache Phoenix Performance Tuning". Hortonworks. Retrieved 28 May 2025.
- ^ "Apache Phoenix". Amazon EMR Release Guide. Amazon Web Services. Retrieved 28 May 2025.
- ^ "Use Apache Phoenix and SQLLine with HBase in HDInsight". Microsoft. Retrieved 28 May 2025.
- ^ "Phoenix SQL Reference". Apache Phoenix. Apache Software Foundation. Retrieved 28 May 2025.
- ^ "Performance". Apache Phoenix. Apache Software Foundation. Retrieved 28 May 2025.
- ^ "Salted Tables". Apache Phoenix. Apache Software Foundation. Retrieved 28 May 2025.
- ^ "Dynamic Columns". Apache Phoenix. Apache Software Foundation. Retrieved 28 May 2025.
- ^ "Secondary Indexes". Apache Phoenix. Apache Software Foundation. Retrieved 28 May 2025.
- ^ "Ecosystem". Apache Phoenix. Apache Software Foundation. Retrieved 28 May 2025.
- ^ "Phoenix Spark Plugin". Apache Phoenix. Apache Software Foundation. Retrieved 28 May 2025.
- ^ "Pig Integration". Apache Phoenix. Apache Software Foundation. Retrieved 28 May 2025.
- ^ "Phoenix Flume Plugin". Apache Phoenix. Apache Software Foundation. Retrieved 28 May 2025.
- ^ "Phoenix Connectors". Apache Software Foundation. Retrieved 28 May 2025.
- ^ "SQL Client Installation". Apache Phoenix. Apache Software Foundation. Retrieved 28 May 2025.
- ^ "Who is Using Apache Phoenix?". Apache Phoenix. Apache Software Foundation. Retrieved 28 May 2025.
- ^ James Taylor (2013). "Open Sourcing Phoenix: A SQL Layer Over HBase". Salesforce Engineering. Retrieved 28 May 2025.
- ^ "FAQ - Limitations". Apache Phoenix. Apache Software Foundation. Retrieved 28 May 2025.