YSmart is a correlation aware SQL-to-MapReduce translator, which is built on top of the Hadoop platform. For a given SQL query and related table schemas, YSmart can automatically translate the query into a series of Hadoop MapReduce programs written in Java. Compared to other SQL-to-MapReduce translators, YSmart has been proved to have the following advantages:
- High Performance. The MapReduce programs generated by YSmart are optimized. YSmart can automatically detect and utilize intra-query correlations when translating a query. This correlation-aware ability significantly reduces redundant computation, unnecessary disk IO operations and network overhead. See the Performance page to learn the performance benefits of YSmart.
- High Extensibility. YSmart is easy to modify and extend. It is designed with the goal of extensibility. The major part of YSmart is implemented in Python which makes the codes much easier to understand. Due to its modularity and script nature, users can easily modify the current functionalities or add new functionalities to YSmart.
- High Flexibility. YSmart can run in two different modes: translation-mode and execution-mode. In the translation-mode, YSmart only translates the query into Java codes while in the execution-mode YSmart will also compile and execute the generated codes. Because of this flexibility, users can easily read, modify and customize the generated codes.
YSmart is an independent open source project with the Apache 2.0 license. It is also developed for the purpose to create a teaching and learning tool for executing queries on top of Hadoop. Currently YSmart only supports a subset features of select queries in SQL. It is still under continuous development. If you have any question or suggestion, please email the authors at firstname.lastname@example.org.
YSmart has been merged into Apache Hive.
- July 18, 2013: YSmart has been merged into Apache Hive. Here is the Link.
- June 20, 2013: YSmart Release 13.06 is available for public usage.
- March 12, 2012: YSmart Release 12.03 is available for public usage.
- Jan. 1, 2012: YSmart Release 12.01 is available for public usage.
- Source code (Linux 32bit or 64bit): YSmart-1306.tar.gz
To start using YSmart, you can try the YSmart Online Version or download and install YSmart on your own computer.
YSmart is very easy to configure. You can use YSmart if you have a Linux system with Python installed.
- Step1. Download and extract the source code YSmart-1306.tar.gz
- Step2. Translate an example TPC-H query(Q17): python translation.py test/tpch_test/17.sql test/tpch_test/tpch.schema
Example Usage of YSmart:
After Step2, a directory named "result" will be created which contains:
- a script file named "testquery.script" which specifies how to compile and run the generated code on Hadoop.
- a directory named "YSmartCode" which contains the generated Java source code files.
- a directory named "YSmartJar" which can be used to store the Jar file after compiling the Java codes.
A detailed usage of YSmart is described here.
- Rubao Lee, Tian Luo, Yin Huai, Fusheng Wang, Yongqiang He, and Xiaodong Zhang. YSmart: Yet another SQL-to-MapReduce Translator. Proceedings of 31st International Conference on Distributed Computing Systems (Best Paper Award in ICDCS 2011), Minneapolis, Minnesota, June 20-24, 2011.
- Yin Huai, Rubao Lee, Simon Zhang, Cathy H. Xia, and Xiaodong Zhang. DOT: a matrix model for analyzing, optimizing and deploying software for big data analytics in distributed systems. Proceedings of 2nd ACM Symposium on Cloud Computing (SOCC 2011), Cascais, Portugal, October 27-28, 2011.
- Yongqiang He, Rubao Lee, Yin Huai, Zheng Shao, Namit Jain, Xiaodong Zhang, Zhiwei Xu. RCFile: a fast and space-efficient data placement structure in MapReduce-based warehouse systems. Proceedings of International Conference on Data Engineering (ICDE 2011), Hannover, Germany, April 11-16, 2011.