Blog

WebSphere Commerce V7 Efficient Data Load

New Dataload Tool, To Replace Massload.

Data load means to load the data from external data sources to WebSphere Commerce database. There are typically two scenarios for data loading: initial load and delta load. Initial data loading is the first time you load the data into the database. Usually large amount of data is involved during this loading. Delta load is used for data insert, update and delete. Delta load can happen daily or weekly.

A new data load solution is provided in the WebSphere Commerce version 7. The main purpose of this solution is to reduce the total cost of the loading custom data.

This tool runs in one step rather than the multiple massload steps.

It maps CSV to WC busine

ss objects (data services) layer and writes to the database.

  1. It uses a reader to read and parse the input,
  2. Then the business object builder maps the input to business objects
  3. Then the business objects mediator converts the business objects in to physical objects that represent rows in the database
  4. A writer is used to write the rows to the database.

Limitations:

  • It can only read CSV.
  • It can only update a limited number of business objects.
  • It is less flexible and less customizable than massloader.
  • It only maps input fields to BO fields, what about any logic?  Where does that go?
    • (in massload, we can code logic in XSL or call out to Java if necessary)
  • How do we load custom tables?

With massload:

1. Generate the DTD
2. If transformation is necessary, open notepad and write an XSL
3. If loading CSV, then write a schema for it and create a manifest file
4. Run the tools, done.

Estimate: 4 hours tops

With new “efficient” data loader:

1. Start RAD
2. Write XSD to model new BO
3. Define service module
4. Apply pattern to generate SDOs
5. Implement the BO mediators
6. Compile
7. Write configuration files for the new BO
8. Write test code for the business object
9. Start WC test server
10. Test
11. Debug, compile, restart WC test server, re-test cycle
12. Write mapping config files for the data loader
13. Run data loader, done

Estimate: 3 days, maybe more

I do not see how it is even possible to get rid of massloader given that both the store publish and catalog import tools all depend on it.

Supposedly, this is not the same as BODL, which is used by IBM’s services group.

Efficient Data Load

  • Existing dataload solution

what is data loading

initial load

delta load

massload solution

csv -> xml -> id resolve -> massload file -> wc db

several utilities provided to complete each step

issues

tedious, go through many steps

requires deep knowledge of WC db schema

need to know tables, relationship between tables

requires knowledge of id resolvable file format

operation

error prone, hard to debug

performance issue, slow for large data loading

  • New dataload solution

reduces cost of dataloading

csv file -> db

business object

need to understand business object schema

configure mapping from csv to BO using xpath

benefits:

high performance – no intermediate files

scalability – large files

customizable and extendable

business object based – transaction commit or rollback within BO boundary

better diagnostic and error reporting

components

data loader – exection flow, jdbc batch execution, error/summary reporting

customer reader layer

data reader

business object builder

*** difference BODL –

design perspective, both based on BO

V7 dataload – catalog price, inventory components

BODL – can handle more components

V7 data reader – only CVS reader

BODL data readers – has more readers

BODL is customizable

BODL performance is better

  • Components in data load framework

Business Object Layer

Business Object Mediator

ID resolver

Persistence Layer

Data Writer

Native DB Data Writer

*** Oracle supported

JDBC Data Writer

Business Context Service

global info: store id, store language

The data load and the high level data load flow
The data load and the high level data load flow

1.  data loader:

requires configuration file

2.  customer reader layer

in V7, only CSV reader supported, can write own custom reader, tutorial available

3.  business object builder

business object config file

which object to build

how to build it

BO passed to BO layer

4.  BO mediator

convert to physical object (java rep of db load)

physical object represents row in db

5. id resolver called to resolve id

6. physical objects written to db (or to db load file) by data writer

  • Dataload scenarios

initial load

large input data

may use natice database load or jdbc load

dataload mode is typically “insert”

may specify the key range

will not query keys table

better performance

delta load

use “replace” mode

should not specify key range

use jdbc load

can be loaded to staging or production servers

  • wc-dataload.xml

data load environment: wc-dataload-env.xml

loadorder

maxerror

commitcount

batchsize

dataloadmode=”Replace”

loaditem

name

BO config file

startkey, endkey

datasource location = “cataloggroup.csv”

Stay tuned for the next topic – Web 2.0 Starter Stores (Madisons B2C & Elite B2B)

The Madisons starter store improves on the Web 2.0 store through more convenient packaging, improvements in store functionality to aid shoppers in their shopping experience, and improvements to store design to reduce development costs.

The Elite Web 2.0 based B2B starter store provides a rich experience and streamlined checkout process to enable B2B businesses to deliver a B2C-like shopping experience.