Wiki Markup |
---|
h2. Pentaho and Hadoop - Visual Development, Data Integration, Immediate Insight
Pentaho Business Analytics provides easy to use visual development tools and big data analytics that empower users to easily prepare, model, visualize and explore structured and unstructured data sets in Hadoop. Pentaho simplifies the end-to-end Hadoop data life cycle by providing a complete platform from data preparation to predictive analytics. Pentaho is unique by providing in-Hadoop execution for extremely fast performance.
!PTHOandHadoop.png|align=center!
*For a complete overview of using Pentaho and Hadoop, visit [PentahoBigData.com/ecosystem/platforms/hadoop|http://www.pentahobigdata.com/ecosystem/platforms/hadoop].*
Pentaho is integrated with Hadoop at many levels
* *Traditional ETL* - Graphical designer to visually build transformations that read and write data in Hadoop from/to anywhere and transform the data on the way. No coding required - unless you want to. Transformation steps include...
** HDFS files Read and Write
** HBase Read/Write
** Hive, Hive2 SQL Query and Write
** Impala SQL Query and Write
** Support for Avro file format and snappy compression
* *Data Orchestration* - Graphical designer to visually build and schedule jobs that orchestrate processing, data movement and most aspects of operationalizing your data preparation. No coding required - unless you want to. Job steps include...
** HDFS Copy files
** Map Reduce Job Execution
** Pig Script Execution
** Amazon EMR Job Execution
** Oozie integration
** Sqoop Import/Export
** Pentaho Map Reduce Execution
h2. In cluster ETL
{composition-setup}{composition-setup}{deck:id=MyDeck|class=tan}
{card:label= 1) Pentaho MapReduce with Kettle}
The first three videos compare using Pentaho Kettle to create and execute a simple MapReduce job with using Java to solve the same problem. The Kettle transform shown here runs as a Mapper and Reducer within the cluster.
{youtube}KZe1UugxXcs{youtube}
{card}
{card:label= 2) Straight Java}
What would the same task as "1) Pentaho MapReduce with Kettle" look like if you coded it in Java? At a half hour long, you may not want to watch the entire video...
{youtube}cfFq1XB4kww{youtube}
{card}
{card:label= 3) Compare using Kettle to Java}
This is a quick summary of the previous two videos, "1) Pentaho MapReduce with Kettle" and "2) Straight Java", and why Pentaho Kettle boosts productivity and maintainability.
{youtube}ZnyuTICOrhk{youtube}
{card}
{card:label=Loading Data into Hadoop}
A quick example of loading into the Hadoop Distributed File System (HDFS) using Pentaho Kettle.
{youtube}Ylekzmd6TAc{youtube}
{card}
{card:label=Extracting Data from Hadoop}
A quick example of extracting data from the Hadoop Distributed File System (HDFS) using Pentaho Kettle.
{youtube}3Xew58LcMbg{youtube}
{card}
{deck}
h1. Hadoop Topics
{pagetree:root=@self|sort=position|excerpt=true|reverse=false|startDepth=2|expandCollapseAll=true} |
Page Comparison
General
Content
Integrations