Jump to content

Apache Phoenix

From Wikipedia, the free encyclopedia

Apache Phoenix
Developer(s)Apache Software Foundation
Initial release22 May 2014; 11 years ago (2014-05-22)[1]
Stable release
4.x4.17.2 / 25 October 2023; 19 months ago (2023-10-25)[2]
5.x5.2.0 / 12 November 2024; 6 months ago (2024-11-12)[2]
RepositoryGitHub Repository
Written inJava, SQL
Operating systemCross-platform
TypeSQL query engine, Database middleware
LicenseApache License 2.0
Websitephoenix.apache.org

Apache Phoenix is an open source, massively parallel, SQL query engine that enables OLTP and operational analytics for Apache Hadoop using Apache HBase as its backing store.[3] Phoenix provides a JDBC driver that hides the intricacies of the NoSQL store enabling users to create, delete, and alter SQL tables, views, indexes, and sequences; insert and delete rows singly and in bulk; and query data through SQL.[4] Phoenix compiles queries and other statements into native HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets.[5]

Overview

[edit]

Apache Phoenix enables SQL-based access to HBase data, compiling SQL queries into native HBase API calls rather than using MapReduce.[6] This approach enables low-latency queries on HBase data. Key features include:

  • Standard SQL support: Supports a subset of ANSI SQL-92 with extensions[7]
  • JDBC connectivity: Full JDBC driver for integration with existing tools[8]
  • Schema management: DDL support for creating tables, views, and indexes[9]
  • Query optimization: Cost-based optimizer with statistics collection[10]
  • Secondary indexes: Global and local secondary indexes for improved query performance[11]
  • Transactions: ACID transaction support through integration with Apache Tephra or Apache Omid[12]

Architecture

[edit]

Phoenix acts as a SQL skin over HBase, implementing a SQL query engine on top of HBase's native APIs.[13] The architecture consists of:

  • Query parser: Parses SQL queries into an abstract syntax tree[14]
  • Query optimizer: Optimizes queries using table statistics and cost-based optimization[15]
  • Query plan executor: Executes the optimized plan using HBase scans and filters[16]
  • Metadata management: Stores table metadata in HBase system tables[17]

History

[edit]

Phoenix began as an internal project by Salesforce in 2012, created by James Taylor and his team to provide SQL access to HBase for Salesforce's internal use cases.[18] It was open-sourced on GitHub on 29 January 2013.[19]

The project entered the Apache Incubator in December 2013[20] and graduated to become a top-level Apache project on 22 May 2014.[1]

Adoption

[edit]

Apache Phoenix has been adopted by various organizations and is included in major Hadoop distributions:

Features

[edit]

SQL Support

[edit]

Phoenix supports a subset of standard SQL including:[25]

  • SELECT, UPSERT, DELETE statements
  • CREATE, DROP, ALTER for tables, views, and indexes
  • GROUP BY, ORDER BY, HAVING clauses
  • Joins (inner, left, right, full)
  • Subqueries
  • Derived tables
  • CASE expressions
  • User-defined functions

Performance Features

[edit]

Phoenix includes several performance optimizations:[26]

  • Skip scan filter: Optimizes queries with leading wildcards
  • Salted tables: Prevents region server hotspotting[27]
  • Statistics-based query optimization: Uses table statistics for query planning
  • Parallel execution: Leverages HBase coprocessors for server-side operations[28]
  • Index support: Both mutable and immutable secondary indexes[29]

Integration Features

[edit]

Phoenix integrates with various big data tools:[30]

Use Cases

[edit]

Common use cases for Apache Phoenix include:[36]

  • Time-series data analysis
  • Operational reporting on HBase data
  • ETL processing with SQL
  • Real-time data serving layer
  • Ad-hoc querying of big data

Organizations using Phoenix include:

  • Salesforce - Original creator, uses for internal analytics[37]
  • Bloomberg - Financial data analytics
  • eBay - Real-time analytics platform
  • Huawei - Telecom data processing

Limitations

[edit]

Phoenix has certain limitations compared to traditional relational databases:[38]

  • No support for cross-row or cross-table transactions (except with transaction managers)
  • Limited SQL feature set compared to traditional RDBMS
  • Performance dependent on underlying HBase cluster
  • No support for triggers or stored procedures

See also

[edit]

References

[edit]
  1. ^ a b "The Apache Software Foundation Announces Apache Phoenix as a Top-Level Project". Apache Software Foundation. 22 May 2014. Retrieved 28 May 2025.
  2. ^ a b "Apache Phoenix Downloads". Apache Software Foundation. Retrieved 28 May 2025.
  3. ^ "Apache Phoenix: OLTP and operational analytics for Apache Hadoop". Apache Software Foundation. Retrieved 28 May 2025.
  4. ^ "Frequently Asked Questions". Apache Phoenix. Apache Software Foundation. Retrieved 28 May 2025.
  5. ^ James Taylor (4 June 2014). Apache Phoenix: Transforming HBase into a SQL database. HBaseCon 2014. Cloudera. Retrieved 28 May 2025.
  6. ^ "Phoenix in 15 minutes or less". Apache Phoenix. Apache Software Foundation. Retrieved 28 May 2025.
  7. ^ "Phoenix SQL Grammar". Apache Phoenix. Apache Software Foundation. Retrieved 28 May 2025.
  8. ^ "Phoenix Query Server". Apache Phoenix. Apache Software Foundation. Retrieved 28 May 2025.
  9. ^ "Schema Information". Apache Phoenix. Apache Software Foundation. Retrieved 28 May 2025.
  10. ^ "Statistics Collection". Apache Phoenix. Apache Software Foundation. Retrieved 28 May 2025.
  11. ^ "Secondary Indexing". Apache Phoenix. Apache Software Foundation. Retrieved 28 May 2025.
  12. ^ "Transactions". Apache Phoenix. Apache Software Foundation. Retrieved 28 May 2025.
  13. ^ Dimiduk, Nick; Khurana, Amandeep (2012). HBase in Action. Manning Publications. pp. 312–315. ISBN 978-1617290527.
  14. ^ "How to Develop". Apache Phoenix. Apache Software Foundation. Retrieved 28 May 2025.
  15. ^ "Explain Plan". Apache Phoenix. Apache Software Foundation. Retrieved 28 May 2025.
  16. ^ "Tuning". Apache Phoenix. Apache Software Foundation. Retrieved 28 May 2025.
  17. ^ "Upgrade". Apache Phoenix. Apache Software Foundation. Retrieved 28 May 2025.
  18. ^ Abel Avram (31 January 2013). "Phoenix: Running SQL Queries on Apache HBase". InfoQ. Retrieved 28 May 2025.
  19. ^ James Taylor (29 January 2013). "Initial commit". GitHub. Retrieved 28 May 2025.
  20. ^ "Phoenix Incubation Status". Apache Software Foundation. Archived from the original on 17 May 2014. Retrieved 28 May 2025.
  21. ^ "Overview of Apache Phoenix". Cloudera. Retrieved 28 May 2025.
  22. ^ "Apache Phoenix Performance Tuning". Hortonworks. Retrieved 28 May 2025.
  23. ^ "Apache Phoenix". Amazon EMR Release Guide. Amazon Web Services. Retrieved 28 May 2025.
  24. ^ "Use Apache Phoenix and SQLLine with HBase in HDInsight". Microsoft. Retrieved 28 May 2025.
  25. ^ "Phoenix SQL Reference". Apache Phoenix. Apache Software Foundation. Retrieved 28 May 2025.
  26. ^ "Performance". Apache Phoenix. Apache Software Foundation. Retrieved 28 May 2025.
  27. ^ "Salted Tables". Apache Phoenix. Apache Software Foundation. Retrieved 28 May 2025.
  28. ^ "Dynamic Columns". Apache Phoenix. Apache Software Foundation. Retrieved 28 May 2025.
  29. ^ "Secondary Indexes". Apache Phoenix. Apache Software Foundation. Retrieved 28 May 2025.
  30. ^ "Ecosystem". Apache Phoenix. Apache Software Foundation. Retrieved 28 May 2025.
  31. ^ "Phoenix Spark Plugin". Apache Phoenix. Apache Software Foundation. Retrieved 28 May 2025.
  32. ^ "Pig Integration". Apache Phoenix. Apache Software Foundation. Retrieved 28 May 2025.
  33. ^ "Phoenix Flume Plugin". Apache Phoenix. Apache Software Foundation. Retrieved 28 May 2025.
  34. ^ "Phoenix Connectors". Apache Software Foundation. Retrieved 28 May 2025.
  35. ^ "SQL Client Installation". Apache Phoenix. Apache Software Foundation. Retrieved 28 May 2025.
  36. ^ "Who is Using Apache Phoenix?". Apache Phoenix. Apache Software Foundation. Retrieved 28 May 2025.
  37. ^ James Taylor (2013). "Open Sourcing Phoenix: A SQL Layer Over HBase". Salesforce Engineering. Retrieved 28 May 2025.
  38. ^ "FAQ - Limitations". Apache Phoenix. Apache Software Foundation. Retrieved 28 May 2025.
[edit]