Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 4.0
Wiki Markup
{scrollbar}
{

Excerpt

...

How

...

to

...

read

...

data

...

from

...

a

...

data

...

source

...

(flat

...

file)

...

and

...

write

...

it

...

to

...

a

...

collection

...

in

...

MongoDB

...

.

...

By

...

the

...

end

...

of

...

this

...

guide

...

you

...

should

...

understand

...

how

...

data

...

can

...

be

...

read

...

from

...

many

...

different

...

data

...

sources

...

and

...

written

...

to

...

MongoDB.

...

The

...

data

...

we

...

are

...

going

...

to

...

use

...

contains

...

data

...

about

...

the

...

flow

...

of

...

visitors

...

to

...

a

...

web

...

site.

...

Intro

...

Video

Prerequisites

In order follow along with this how-to

...

guide

...

you

...

will

...

need

...

the

...

following:

...

MongoDB

...

A

...

single-node

...

local

...

cluster

...

is

...

sufficient

...

for

...

these

...

exercises

...

but

...

a

...

larger

...

and/or

...

remote

...

configuration

...

will

...

work

...

as

...

well.

...

You

...

will

...

need

...

to

...

know

...

the

...

address

...

and

...

port

...

that

...

MongoDB

...

is

...

running

...

on

...

and

...

have

...

a

...

user

...

id

...

and

...

password

...

for

...

the

...

server

...

(if

...

applicable).

...


These

...

guides

...

were

...

developed

...

using

...

the

...

MongoDB

...

version

...

2.0.2.

...

You

...

can

...

find

...

MongoDB

...

downloads

...

here:

...

http://www.mongodb.org/downloads

...

Pentaho Data Integration

A desktop installation of the Kettle design tool called 'Spoon'.

...

Download

...

here.

...

Sample Files

The sample data files for this guide is called page_successions.txt.zip

...

Step-By-Step

...

Instructions

...

Setup

...

Start

...

MongoDB

...

if

...

is

...

not

...

running.

...

Create

...

a

...

Data

...

Transformation

...

Start

...

PDI

...

on

...

your

...

desktop.

...

Once

...

it

...

is

...

running

...

choose

...

'File'

...

->

...

'New'

...

->

...

'Transformation'

...

from

...

the

...

menu

...

system

...

or

...

click

...

on

...

the

...

'New

...

file'

...

icon

...

on

...

the

...

toolbar

...

and

...

choose

...

the

...

'Transformation'

...

option.

...

:= }
Tip
title
Speed
Tip

You

can

download

the

Kettle

Transform

[^populate

populate_mongodb_page_successions.ktr

]

already

completed{tip} # *Add a Text File Input Step*: We are going to read data from a text file, so expand the 'Input' section of the Design palette and drag a 'Text file input' step onto the transformation canvas. !worddave01a63c0beee97ae0fdeec90716c3050.png|height=552,width=453! Notice that there are lots of other inputs that we could have used such as a database (including Hive), applications, and specific file formats. Under the Big Data section there are other input including Cassandra, HBase, and MapReduce. *Select the file*: Double-click on the 'Text file input' step to edit it's properties. Click on the 'Browse' button on the right side of the dialog to select a file. Locate the page_successions.txt file. Click on the 'Add' button to add the file to the selected files list. The dialog should look something like this: !worddav9f63d734dc8c4fd949a955f93e50eaba.png|height=230,width=533! # *Create Data Fields*: Click on the 'Fields' tab. Then click the 'Get Fields' button. Click 'OK' to sample 100 lines. You will see the 'Scan results' window. When you close the 'Scan results' window you will see the fields filled in for you: !worddava10f18fb7c36d10bd96961f0498056b3.png|height=129,width=489! # *Preview Data*: Click on the 'Preview Rows' button and accept 1000 as the number of rows to preview. You will see a table of preview data read from the text file: !worddav1bfd7102643e4309a660b2c0ca551138.png|height=178,width=357! # *Add a MongoDB Output Step*: Close the preview window and click on 'OK' on the 'Text file input' window. On the design palette expand the 'Big Data' section and drag a 'MongoDb Output' step onto the transformation canvas. Your canvas should look like this: !worddav4153de6198aa5e086aba1f6630da6b6b.png|height=184,width=450! # *Connect the Input and Output Steps*: Hover the mouse over the 'Text file input' step and a tooltip will appear. !worddav4cd2293b990535f043fcfa32f7331bd8.png|height=54,width=54! Click on the output connector (the green arrow pointing to the right) and drag a connector arrow to the 'MongoDb Output' step. Your canvas should look like this: !worddav03c33c330f48e5dae27e5c054f7d317b.png|height=186,width=444! # *Edit the MongoDB Output Step*: Double-click on the 'MongoDB Output' step to edit its properties. Enter this information on the 'Configure Connection' tab: ## The host, port, Username and Password: the connection information for your MongoDB installation. ## Database: 'Demo' ## Collection: 'PageSuccessions' ## Truncate collection: Checked. This will empty the PageSuccessions collection before adding the incoming data. !worddavb8ed3f1ad039153d43c89a54882bafbe.png|height=212,width=513! On the 'Mongo document fields' tab click on the 'Get Fields' button to populate the table. !worddav47240c466b98dbc4e68bc5ed36d5f871.png|height=212,width=513! On the 'Create/drop indexes' tab, specify that we want to create an index on the 'url' field. !worddave4d3fa1ce3b51af138ca64b13c8bf622.png|height=212,width=513! Click 'OK' to close the window. # *Save the Transformation*: Choose 'File' -> 'Save as...' from the menu system. Save the transformation as

completed

  1. Add a Text File Input Step: We are going to read data from a text file, so expand the 'Input' section of the Design palette and drag a 'Text file input' step onto the transformation canvas.
    Image Added
    Notice that there are lots of other inputs that we could have used such as a database (including Hive), applications, and specific file formats. Under the Big Data section there are other input including Cassandra, HBase, and MapReduce. Select the file: Double-click on the 'Text file input' step to edit it's properties. Click on the 'Browse' button on the right side of the dialog to select a file. Locate the page_successions.txt file. Click on the 'Add' button to add the file to the selected files list. The dialog should look something like this:
    Image Added
  2. Create Data Fields: Click on the 'Fields' tab. Then click the 'Get Fields' button. Click 'OK' to sample 100 lines. You will see the 'Scan results' window. When you close the 'Scan results' window you will see the fields filled in for you:
    Image Added
  3. Preview Data: Click on the 'Preview Rows' button and accept 1000 as the number of rows to preview. You will see a table of preview data read from the text file:
    Image Added
  4. Add a MongoDB Output Step: Close the preview window and click on 'OK' on the 'Text file input' window. On the design palette expand the 'Big Data' section and drag a 'MongoDb Output' step onto the transformation canvas. Your canvas should look like this: Image Added
  5. Connect the Input and Output Steps: Hover the mouse over the 'Text file input' step and a tooltip will appear. Image Added Click on the output connector (the green arrow pointing to the right) and drag a connector arrow to the 'MongoDb Output' step. Your canvas should look like this:
    Image Added
  6. Edit the MongoDB Output Step: Double-click on the 'MongoDB Output' step to edit its properties. Enter this information on the 'Configure Connection' tab:
    1. The host, port, Username and Password: the connection information for your MongoDB installation.
    2. Database: 'Demo'
    3. Collection: 'PageSuccessions'
    4. Truncate collection: Checked. This will empty the PageSuccessions collection before adding the incoming data.
      Image Added
      On the 'Mongo document fields' tab click on the 'Get Fields' button to populate the table.
      Image Added
      On the 'Create/drop indexes' tab, specify that we want to create an index on the 'url' field.
      Image Added
      Click 'OK' to close the window.
  7. Save the Transformation: Choose 'File' -> 'Save as...' from the menu system. Save the transformation as 'populate_mongodb_page_successions.ktr'

...

  1. into

...

  1. a

...

  1. folder

...

  1. of

...

  1. your

...

  1. choice.

...

  1. Run

...

  1. the

...

  1. Transformation

...

  1. :

...

  1. Choose

...

  1. 'Action'

...

  1. ->

...

  1. 'Run'

...

  1. from

...

  1. the

...

  1. menu

...

  1. system

...

  1. or

...

  1. click

...

  1. on

...

  1. the

...

  1. green

...

  1. run

...

  1. button

...

  1. on

...

  1. the

...

  1. transformation

...

  1. toolbar.

...

  1. A

...

  1. 'Execute

...

  1. a

...

  1. transformation'

...

  1. window

...

  1. will

...

  1. open.

...

  1. Click

...

  1. on

...

  1. the

...

  1. 'Launch'

...

  1. button.

...

  1. An

...

  1. 'Execution

...

  1. Results'

...

  1. panel

...

  1. will

...

  1. open

...

  1. at

...

  1. the

...

  1. bottom

...

  1. of

...

  1. the

...

  1. PDI

...

  1. window

...

  1. and

...

  1. it

...

  1. will

...

  1. show

...

  1. you

...

  1. the

...

  1. progress

...

  1. of

...

  1. the

...

  1. transformation

...

  1. as

...

  1. it

...

  1. runs.

...

  1. After

...

  1. a

...

  1. few

...

  1. seconds

...

  1. the

...

  1. transformation

...

  1. should

...

  1. finish

...

  1. successfully:

...

  1. Image Added

If any errors occurred the transformation step that failed will be highlighted in red and you can use the 'Logging' tab to view error messages.

Check the MongoDB Collection

  1. Using the Mongo CLI, type:
    Code Block
    
    
    use Demo;
    db.PageSuccessions.find();
    
    

...

  1. You

...

  1. should

...

  1. see

...

  1. a

...

  1. result

...

  1. like

...

  1. this:

...

  1. Code Block

...

  1. 
    { "_id" : ObjectId("4f21850e09d01689c7d9887e"), "key" : "--firstpage--~^~/about", "url" : "--firstpage--", "nextUrl" : "/about", "Count" : NumberLong(504) }
    { "_id" : ObjectId("4f21850e09d01689c7d9887f"), "key" : "--firstpage--~^~/about/awards", "url" : "--firstpage--", "nextUrl" : "/about/awards", "Count" : NumberLong(80) }
    { "_id" : ObjectId("4f21850e09d01689c7d98880"), "key" : "--firstpage--~^~/about/customers", "url" : "--firstpage--", "nextUrl" : "/about/customers", "Count" : NumberLong(667) }
    

...


  1. Summary

    During this guide you learned how to populate a MongoDB collection using PDI's graphical design tool. You can use can use this tool to load data into MongoDB from many data sources.
    Other guides in this series cover to sort and group MongoDB data, create reports, and combine data from MongoDB with data from other sources.
    Wiki Markup
    {scrollbar}