Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 32 Next »

Pentaho and Hadoop - Visual Development, Data Integration, Immediate Insight

Pentaho Business Analytics provides easy to use visual development tools and big data analytics that empower users to easily prepare, model, visualize and explore structured and unstructured data sets in Hadoop. Pentaho simplifies the end-to-end Hadoop data life cycle by providing a complete platform from data preparation to predictive analytics. Pentaho is unique by providing in-Hadoop execution for extremely fast performance.

Unable to render embedded object: File (PTHOandHadoop.png) not found.
For a complete overview of using Pentaho and Hadoop, visit PentahoBigData.com/ecosystem/platforms/hadoop.

Pentaho is integrated with Hadoop at many levels

  • Traditional ETL - Graphical designer to visually build transformations that read and write data in Hadoop from/to anywhere and transform the data on the way. No coding required - unless you want to. Transformation steps include...
    • HDFS files Read and Write
    • HBase Read/Write
    • Hive, Hive2 SQL Query and Write
    • Impala SQL Query and Write
    • Support for Avro file format and snappy compression
  • Data Orchestration - Graphical designer to visually build and schedule jobs that orchestrate processing, data movement and most aspects of operationalizing your data preparation. No coding required - unless you want to. Job steps include...
    • HDFS Copy files
    • Map Reduce Job Execution
    • Pig Script Execution
    • Amazon EMR Job Execution
    • Oozie integration
    • Sqoop Import/Export
    • Pentaho MapReduce Execution
  • Pentaho MapReduce - Graphical designer to visually build MapReduce jobs and run them in cluster. With a simple, point-and-click alternative to writing Hadoop MapReduce programs in Java or Pig, Pentaho exposes a familiar ETL-style user interface. Hadoop becomes easily usable by IT and data scientists, not just developers with specialized MapReduce and Pig coding skills. As always, No coding required - unless you want to.
  • Traditional Reporting - All data sources supported above can be used directly or blended with other data to drive our pixel perfect reporting engine. The reports can be secured, parameterized and published to the web to provide guided adhoc capabilities to end users. The reports can be mashed up with other pentaho visualizations to create dashboards.
  • Web Based Interactive Reporting - Pentaho's Metadata layer leverages data stored in Hive, Hive2 and Impala for WYSIWYG, interactive, self-service reporting. More Info
  • Pentaho Analyzer - Leverage your data stored Impala or Hive2 (Stinger) for interactive visual analysis with drill through, lasso filtering, zooming, and attribute highlighting for greater insight. More Info

In cluster ETL

Unknown macro: {composition-setup}
Unknown macro: {deck}
Unknown macro: {card}

The first three videos compare using Pentaho Kettle to create and execute a simple MapReduce job with using Java to solve the same problem. The Kettle transform shown here runs as a Mapper and Reducer within the cluster.

Unknown macro: {youtube}

KZe1UugxXcs

Unknown macro: {card}

What would the same task as "1) Pentaho MapReduce with Kettle" look like if you coded it in Java? At a half hour long, you may not want to watch the entire video...

Unknown macro: {youtube}

cfFq1XB4kww

Unknown macro: {card}

This is a quick summary of the previous two videos, "1) Pentaho MapReduce with Kettle" and "2) Straight Java", and why Pentaho Kettle boosts productivity and maintainability.

Unknown macro: {youtube}

ZnyuTICOrhk

Unknown macro: {card}

A quick example of loading into the Hadoop Distributed File System (HDFS) using Pentaho Kettle.

Unknown macro: {youtube}

Ylekzmd6TAc

Unknown macro: {card}

A quick example of extracting data from the Hadoop Distributed File System (HDFS) using Pentaho Kettle.

Unknown macro: {youtube}

3Xew58LcMbg

Hadoop Topics

  • No labels