New Dataload Tool, To Replace Massload.
Data load means to load the data from external data sources to WebSphere Commerce database. There are typically two scenarios for data loading: initial load and delta load. Initial data loading is the first time you load the data into the database. Usually large amount of data is involved during this loading. Delta load is used for data insert, update and delete. Delta load can happen daily or weekly.
A new data load solution is provided in the WebSphere Commerce version 7. The main purpose of this solution is to reduce the total cost of the loading custom data.
This tool runs in one step rather than the multiple massload steps.
It maps CSV to WC busine
ss objects (data services) layer and writes to the database.
- It uses a reader to read and parse the input,
- Then the business object builder maps the input to business objects
- Then the business objects mediator converts the business objects in to physical objects that represent rows in the database
- A writer is used to write the rows to the database.
Limitations:
- It can only read CSV.
- It can only update a limited number of business objects.
- It is less flexible and less customizable than massloader.
- It only maps input fields to BO fields, what about any logic? Where does that go?
- (in massload, we can code logic in XSL or call out to Java if necessary)
- How do we load custom tables?
With massload:
1. Generate the DTD
2. If transformation is necessary, open notepad and write an XSL
3. If loading CSV, then write a schema for it and create a manifest file
4. Run the tools, done.
Estimate: 4 hours tops
With new “efficient” data loader:
1. Start RAD
2. Write XSD to model new BO
3. Define service module
4. Apply pattern to generate SDOs
5. Implement the BO mediators
6. Compile
7. Write configuration files for the new BO
8. Write test code for the business object
9. Start WC test server
10. Test
11. Debug, compile, restart WC test server, re-test cycle
12. Write mapping config files for the data loader
13. Run data loader, done
Estimate: 3 days, maybe more
I do not see how it is even possible to get rid of massloader given that both the store publish and catalog import tools all depend on it.
Supposedly, this is not the same as BODL, which is used by IBM’s services group.
Efficient Data Load
- Existing dataload solution
what is data loading
initial load
delta load
massload solution
csv -> xml -> id resolve -> massload file -> wc db
several utilities provided to complete each step
issues
tedious, go through many steps
requires deep knowledge of WC db schema
need to know tables, relationship between tables
requires knowledge of id resolvable file format
operation
error prone, hard to debug
performance issue, slow for large data loading
- New dataload solution
reduces cost of dataloading
csv file -> db
business object
need to understand business object schema
configure mapping from csv to BO using xpath
benefits:
high performance – no intermediate files
scalability – large files
customizable and extendable
business object based – transaction commit or rollback within BO boundary
better diagnostic and error reporting
components
data loader – exection flow, jdbc batch execution, error/summary reporting
customer reader layer
data reader
business object builder
*** difference BODL –
design perspective, both based on BO
V7 dataload – catalog price, inventory components
BODL – can handle more components
V7 data reader – only CVS reader
BODL data readers – has more readers
BODL is customizable
BODL performance is better
- Components in data load framework
Business Object Layer
Business Object Mediator
ID resolver
Persistence Layer
Data Writer
Native DB Data Writer
*** Oracle supported
JDBC Data Writer
Business Context Service
global info: store id, store language
1. data loader:
requires configuration file
2. customer reader layer
in V7, only CSV reader supported, can write own custom reader, tutorial available
3. business object builder
business object config file
which object to build
how to build it
BO passed to BO layer
4. BO mediator
convert to physical object (java rep of db load)
physical object represents row in db
5. id resolver called to resolve id
6. physical objects written to db (or to db load file) by data writer
- Dataload scenarios
initial load
large input data
may use natice database load or jdbc load
dataload mode is typically “insert”
may specify the key range
will not query keys table
better performance
delta load
use “replace” mode
should not specify key range
use jdbc load
can be loaded to staging or production servers
- wc-dataload.xml
data load environment: wc-dataload-env.xml
loadorder
maxerror
commitcount
batchsize
dataloadmode=”Replace”
loaditem
name
BO config file
startkey, endkey
datasource location = “cataloggroup.csv”
Stay tuned for the next topic – Web 2.0 Starter Stores (Madisons B2C & Elite B2B)
The Madisons starter store improves on the Web 2.0 store through more convenient packaging, improvements in store functionality to aid shoppers in their shopping experience, and improvements to store design to reduce development costs.
The Elite Web 2.0 based B2B starter store provides a rich experience and streamlined checkout process to enable B2B businesses to deliver a B2C-like shopping experience.
Given a choice, would you recommend the new DataLoad or BODL. From the description, it seems BODL is more flexible and performant. Please advise.