Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Wiki Markup
h2. Pentaho and Hadoop - Visual Development, Data Integration, Immediate Insight

Pentaho Business Analytics provides easy to use visual development tools and big data analytics that empower users to easily prepare, model, visualize and explore structured and unstructured data sets in Hadoop. Pentaho simplifies the end-to-end Hadoop data life cycle by providing a complete platform from data preparation to predictive analytics. Pentaho is unique by providing in-Hadoop execution for extremely fast performance. 

!PTHOandHadoop.png|align=center!
*For a complete overview of using Pentaho and Hadoop, visit [PentahoBigData.com/ecosystem/platforms/hadoop|http://www.pentahobigdata.com/ecosystem/platforms/hadoop].*

Pentaho is integrated with Hadoop at many levels
* *Traditional ETL* - Graphical designer to visually build transformations that read and write data in Hadoop from/to anywhere and transform the data on the way.  No coding required - unless you want to. Transformation steps include...
** HDFS files Read and Write
** HBase Read/Write
** Hive, Hive2 SQL Query and Write 
** Impala SQL Query and Write
** Support for Avro file format and snappy compression
* *Data Orchestration* - Graphical designer to visually build and schedule jobs that orchestrate processing, data movement and most aspects of operationalizing your data preparation.  No coding required - unless you want to. Job steps include...
** HDFS Copy files
** Map Reduce Job Execution
** Pig Script Execution
** Amazon EMR Job Execution
** Oozie integration
** Sqoop Import/Export
** Pentaho Map Reduce ExecutionMapReduce Execution
* *Pentaho MapReduce* - Graphical designer to visually build MapReduce jobs and run them in cluster. With a simple, point-and-click alternative to writing Hadoop MapReduce programs in Java or Pig, Pentaho exposes a familiar ETL-style user interface. Hadoop becomes easily usable by IT and data scientists, not just developers with specialized MapReduce and Pig coding skills. As always, No coding required - unless you want to.
* *Traditional Reporting* - All data sources supported above can be used directly or blended with other data to drive our pixel perfect reporting engine.  The reports can be secured, parameterized and published to the web to provide guided adhoc capabilities to end users.  The reports can be mashed up with other pentaho visualizations to create dashboards.
* *Web Based Interactive Reporting* - Pentaho's Metadata layer leverages data stored in Hive, Hive2 and Impala for WYSIWYG, interactive, self-service reporting. [More Info|http://www.pentaho.com/product/business-visualization-analytics#self-service-reports]
* *Pentaho Analyzer* - Leverage your data stored Impala or Hive2 (Stinger) for interactive visual analysis with drill through, lasso filtering, zooming, and attribute highlighting for greater insight. [More Info|http://www.pentaho.com/product/business-visualization-analytics#visual-analysis]

 

h2. In cluster ETL


{composition-setup}{composition-setup}{deck:id=MyDeck|class=tan}
{card:label= 1) Pentaho MapReduce with Kettle}
The first three videos compare using Pentaho Kettle to create and execute a simple MapReduce job with using Java to solve the same problem.  The Kettle transform shown here runs as a Mapper and Reducer within the cluster.
{youtube}KZe1UugxXcs{youtube}
{card}

{card:label= 2) Straight Java}
What would the same task as "1) Pentaho MapReduce with Kettle" look like if you coded it in Java?  At a half hour long, you may not want to watch the entire video...
{youtube}cfFq1XB4kww{youtube}
{card}

{card:label= 3) Compare using Kettle to Java}
This is a quick summary of the previous two videos, "1) Pentaho MapReduce with Kettle" and "2) Straight Java", and why Pentaho Kettle boosts productivity and maintainability.
{youtube}ZnyuTICOrhk{youtube}
{card}

{card:label=Loading Data into Hadoop}
A quick example of loading into the Hadoop Distributed File System (HDFS) using Pentaho Kettle.
{youtube}Ylekzmd6TAc{youtube}
{card}

{card:label=Extracting Data from Hadoop}
A quick example of extracting data from the Hadoop Distributed File System (HDFS) using Pentaho Kettle.
{youtube}3Xew58LcMbg{youtube}
{card}
{deck}



h1. Hadoop Topics
{pagetree:root=@self|sort=position|excerpt=true|reverse=false|startDepth=2|expandCollapseAll=true}