Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
519 views
in Technique[技术] by (71.8m points)

python - Database Version Control for MySQL

What method do you use to version-control your database? I've committed all our database tables as separate .sql scripts to our respository (mercurial). In that way, if any member of the team makes a change to the employee table, say, I will immediately know which particular table has been modified when I updated my repository.

Such a method was described in: What are the best practices for database scripts under code control. Presently, I'm writing a python script to execute all the .sql files within the database folder, however, the issue of dependencies due to foreign-key constraints ensures we can't just run the .sql files in just any order.

The python script is to generate a file with the order in which to execute the .sql files. It will execute the .sql files in the order in which they appear in the tableorder.txt file. A table cannot be executed until its foreign key table has been executed, for example:

tableorder.txt

table3.sql
table1.sql
table7.sql and so on

Already, I have generated the dependency list for each table, from code, by parsing the result of the "show create table" mysql command. The dependency list may look thus:

tblstate: tblcountry //tblcountry.sql must be executed before tblstate.sql etc
tblemployee: tbldepartment, tblcountry

To generate the content of the tableorder.txt, I will need an algorithm that will look thus:

function print_table(table):
  foreach table in database:
    if table.dependencies.count == 0
      print to tableorder.txt
    if table.dependencies.count > 0
      print_table(dependency) //print dependency first
end function

As you will imagine, this involves lots of recursion. I'm beginning to wonder if it's worth the effort? If there's some tool out there? What tool (or algorithm) is there to generate a list of the order to execute separate .sql tables and views taking into consideration dependencies? Is it better to version control separate .sql file for each table/view or better to version control the entire database to a single .sql file? I will appreciate any response as this has taken so many days. Thanks.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I do not use MySQL, but rather SQL Server, however, this is how I version my database:

(This is long, but in the end I hope the reasoning for me abandoning a simple schema dump as the primary way to handle database versioning is made apparent.)

  1. I make a modification to the schema and apply it to a test database.

  2. I generate delta change scripts and a dump of the schema after said scripts. (I use ApexSQL, but there are likely MySQL-specific tools to help.)

    1. The delta change scripts know how to go from the current to target schema version: ALTER TABLE existing, CREATE TABLE new, DROP VIEW old .. Multiple operations can occur within the same .SQL file as the delta is of importance.

    2. The dump of the schema is of the target schema version: CREATE TABLE a, CREATE VIEW b .. there is no "ALTER" or "DROP" here, because it is just a snapshot of the target schema. There is one .SQL file per database object as the schema is of importance.

  3. I use RoundhousE to apply the delta change scripts. (I do not use the RoundhousE "anytime script" feature as this does not correctly handle relationships.)

I learned the hard way that applying database schema changes cannot be reliably done without a comprehensive step-by-step plan and, similarly (as noted in the question), the order of relationship dependencies are important. Just storing the "current" or "end" schema is not sufficient. There are many changes that cannot be retroactively applied A->C without knowing A->B->C and some changes B might involve migration logic or corrections. SQL schema change scripts can capture these changes and allow them to be "replayed".

However, at the same time just saving the delta scripts does not provide a "simple view" of the target schema. This is why I also dump all the schema as well as the change scripts and version both. The view dump could, in theory, be used to construct the database but due to relationship dependencies (the very kind noted in the question), it may take some work and I do not use it as part of an automated schema-restore approach: yet, keeping the schema dump part of the Hg version-control allows quick identification of changes and viewing the target schema at a particular version.

The change deltas thus move forward through the revisions while the schema dump provides a view at the current revision. Because the change deltas are incremental and forward-only it is important to keep the branch dealing with these changes "clean", which is easy to do with Hg.

In one of my projects I am currently at database change number 70 - and happy and productive! - after switching to this setup. (And these are deployed changes, not just development changes!)

Happy coding.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...