Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 4.0

...

Include Page
Labs Production
Labs Production

Excerpt

A recipe for executing Weka in Hadoop.

Project Info

  • Status: Compatible with Weka version: >= 3.7.10

...

  • Roadmap:

...

  • Future

...

  • work

...

  • -

...

  • distributed

...

  • clustering,

...

  • distributed

...

  • recommendation

...

  • engine,

...

  • pre-processing

...

  • for

...

  • text

...

  • mining,

...

  • oversampling

...

  • for

...

  • minority

...

  • classes

...

  • Availability:

...

  • Open

...

  • Source

...

  • -

...

...

  • -

...

...

...

...

...

...

...

This package for Weka >= 3.7.10

...

provides

...

several

...

jobs

...

for

...

executing

...

learning

...

tasks

...

inside

...

of

...

Hadoop.

...

These

...

include:

...

  1. Determining

...

  1. ARFF

...

  1. meta

...

  1. data

...

  1. and

...

  1. summary

...

  1. statisitics

...

  1. Computing

...

  1. a

...

  1. correlation

...

  1. or

...

  1. covariance

...

  1. matrix

...

  1. Training

...

  1. a

...

  1. Weka

...

  1. classifier

...

  1. or

...

  1. regressor

...

  1. Generating

...

  1. randomly

...

  1. shuffled

...

  1. (and

...

  1. stratified)

...

  1. input

...

  1. data

...

  1. chunks

...

  1. Evaluating

...

  1. a

...

  1. Weka

...

  1. classifier

...

  1. or

...

  1. regressor

...

  1. via

...

  1. cross-validation

...

  1. or

...

  1. a

...

  1. hold-out

...

  1. set

...

  1. Scoring

...

  1. using

...

  1. a

...

  1. training

...

  1. classifier

...

  1. or

...

  1. regressor

...

A

...

full-featured

...

command

...

line

...

interface

...

is

...

available

...

along

...

with

...

GUI

...

Knowledge

...

Flow

...

components

...

for

...

job

...

orchestration.

...

Predictive

...

models

...

learned

...

in

...

Hadoop

...

are

...

fully

...

compatible

...

with

...

Pentaho

...

Data

...

Integration's

...

"Weka

...

Scoring"

...

transformation

...

step.

...

Image Added

More information on what is available in the distributed Weka package, and how it is implemented, can be found in a three part blog posting:

Jira Issues
columnskey;fixVersion;summary;status;assignee;updated

...

anonymoustrue
urlhttp://jira.pentaho.com/sr/jira.issueviews:searchrequest-xml/temp/SearchRequest.xml?jqlQuery=text+%7E+%22map-reduce%20weka%22&tempMax=1000

...

Try it out!

Open Weka's

...

package

...

manager

...

(GUIChooser->Tools->Package

...

manager)

...

and

...

install

...

"distributedWekaHadoop".

...

Wiki Markup
{scrollbar}