Need to load data from a single file with a 100,000+ records into multiple tables on MySQL maintaining the relationships defined in the file/tables; meaning the relationships already match. The solution should work on the latest version of MySQL, and needs to use the InnoDB engine; MyISAM does not support foreign keys.
I am a completely new to using Pentaho Data Integration (aka Kettle) and any pointers would be appreciated.
I might add that it is a requirement that the foreign key constraints are NOT disabled. Since it's my understanding that if there is something wrong with the database's referential integrity, MySQL will not check for referential integrity when the foreign key constraints are turned back on. SOURCE: 5.1.4. Server System Variables -- foreign_key_checks
All approaches should include some from of validation and a rollback strategy should an insert fail, or fail to maintain referential integrity.
Again, completely new to this, and doing my best to provide as much information as possible, if you have any questions, or request for clarification -- just let me know.
If you are able to post the XML from the kjb and ktr files (jobs/transformations) that would be SUPER. Might even hunt down every comment/answer you've every made anywhere and up vote them... :-) ...really, it's really important to me to find an answer for this.
Thanks!
SAMPLE DATA: To better elaborate with an example, lets assume I am trying to load a file containing employee name, the offices they have occupied in the past and their Job title history separated by a tab.
File:
EmployeeName<tab>OfficeHistory<tab>JobLevelHistory
John Smith<tab>501<tab>Engineer
John Smith<tab>601<tab>Senior Engineer
John Smith<tab>701<tab>Manager
Alex Button<tab>601<tab>Senior Assistant
Alex Button<tab>454<tab>Manager
NOTE: The single table database is completely normalized (as much as a single table may be) -- and for example, in the case of "John Smith" there is only one John Smith; meaning there are no duplicates that would lead to conflicts in referential integrity.
The MyOffice
database schema has the following tables:
Employee (nId, name)
Office (nId, number)
JobTitle (nId, titleName)
Employee2Office (nEmpID, nOfficeId)
Employee2JobTitle (nEmpId, nJobTitleID)
So in this case. the tables should look like:
Employee
1 John Smith
2 Alex Button
Office
1 501
2 601
3 701
4 454
JobTitle
1 Engineer
2 Senior Engineer
3 Manager
4 Senior Assistant
Employee2Office
1 1
1 2
1 3
2 2
2 4
Employee2JobTitle
1 1
1 2
1 3
2 4
2 3
Here's the MySQL DDL to create the database and tables:
create database MyOffice2;
use MyOffice2;
CREATE TABLE Employee (
id MEDIUMINT NOT NULL AUTO_INCREMENT,
name CHAR(50) NOT NULL,
PRIMARY KEY (id)
) ENGINE=InnoDB;
CREATE TABLE Office (
id MEDIUMINT NOT NULL AUTO_INCREMENT,
office_number INT NOT NULL,
PRIMARY KEY (id)
) ENGINE=InnoDB;
CREATE TABLE JobTitle (
id MEDIUMINT NOT NULL AUTO_INCREMENT,
title CHAR(30) NOT NULL,
PRIMARY KEY (id)
) ENGINE=InnoDB;
CREATE TABLE Employee2JobTitle (
employee_id MEDIUMINT NOT NULL,
job_title_id MEDIUMINT NOT NULL,
FOREIGN KEY (employee_id) REFERENCES Employee(id),
FOREIGN KEY (job_title_id) REFERENCES JobTitle(id),
PRIMARY KEY (employee_id, job_title_id)
) ENGINE=InnoDB;
CREATE TABLE Employee2Office (
employee_id MEDIUMINT NOT NULL,
office_id MEDIUMINT NOT NULL,
FOREIGN KEY (employee_id) REFERENCES Employee(id),
FOREIGN KEY (office_id) REFERENCES Office(id),
PRIMARY KEY (employee_id, office_id)
) ENGINE=InnoDB;
My Notes in Response to Selected Answer:
PREP:
- (a) Use the sample data, create a CSV by changing
<TAB>
to comma delimited.
- (b) Install MySQL and create sample database using the MySQL DDL sample
- (c) Install Kettle (it's Java based and will run on anything that runs Java)
- (d) Download KTR file
Dataflow by Step: (My Notes)
- Open the KTR file in Kettle, and double clicked the "CSV file input" and browse to the CSV file that you created. The delimiter should already be set to comma. Then click OKAY.
- Double click "Insert Employees" and select DB connector then follow these directions on Creating a New Database Connection
See Question&Answers more detail:
os