I noticed that SQLAlchemy was slow fetching (and ORMing) some data, which was rather fast to fetch using bare bone SQL. First off, I created a database with a million records:
mysql> use foo
mysql> describe Foo;
+-------+---------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+---------+------+-----+---------+-------+
| id | int(11) | NO | PRI | NULL | |
| A | int(11) | NO | | NULL | |
| B | int(11) | NO | | NULL | |
| C | int(11) | NO | | NULL | |
+-------+---------+------+-----+---------+-------+
mysql> SELECT COUNT(*) FROM Foo;
+----------+
| COUNT(*) |
+----------+
| 1000000 |
+----------+
mysql>
As a crude test, querying all Foo's takes approximately 2 seconds:
herbert@dev0 ~ $ date; echo 'use foo; select * from Foo;' | mysql -uroot -pxxx > /dev/null; date
zo apr 20 18:48:49 CEST 2014
zo apr 20 18:48:51 CEST 2014
If I do this in python using MySQLdb this takes a approximately 3 seconds, including the construction of Foo objects:
herbert@dev0 ~ $ python BareORM.py
query execution time: 0:00:02.198986
total time: 0:00:03.403084
Which is the output of:
#!/usr/bin/python
# -*- coding: utf-8 -*-
import MySQLdb
import sys
import time
import datetime
class Foo:
def __init__(self, a, b, c):
self.a=a; self.b=b; self.c=c;
try:
start = datetime.datetime.now()
con = MySQLdb.connect('localhost', 'root', 'xxx', 'foo')
cur = con.cursor();
cur.execute("""SELECT * FROM Foo LIMIT 1000000""")
print "query execution time: ", datetime.datetime.now()-start
foos = [];
for elem in cur:
foos.append(Foo(elem[1], elem[2], elem[3]))
con.commit()
except MySQLdb.Error, e:
print "Error %d: %s" % (e.args[0], e.args[1])
sys.exit(1)
finally:
if con: con.close()
print "total time: ", datetime.datetime.now()-start
However, using SQLAlchemy to reduce boilerplate code, it needed approximately 25 seconds to do the same job:
herbert@dev0 ~ $ python AlchemyORM.py
total time: 0:00:24.649279
Using this code:
import sqlalchemy
import datetime
import MySQLdb
from sqlalchemy import Column, Integer, create_engine
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker, relationship, backref
Base = declarative_base()
class Foo(Base):
__tablename__ = 'Foo'
id = Column(Integer, primary_key=True)
A = Column(Integer(unsigned=False), nullable=False)
B = Column(Integer(unsigned=False), nullable=False)
C = Column(Integer(unsigned=False), nullable=False)
engine = create_engine('mysql+mysqldb://root:xxx@localhost/foo')
Session = sessionmaker(bind=engine)
session = Session()
start = datetime.datetime.now()
foos = session.query(Foo).limit(1000000).all()
print "total time: ", datetime.datetime.now()-start
Why does SQLAlchemy operate ~10x slower than the bare SQL solution, assuming that SQLAlchemy should do approximately the same thing? Can I speed things up somehow?
This is a minimal working example of a more complicated query, which joins several tables using eager loading. I was considering just doing simple queries on a single table, and then using dictionaries to create id->object maps and collate one-to-N relations. But before doing so, I want to be sure that SQLAlchemy is unable to perform better, because writing your own ORM is a bad idea from a software design point of view. Imho, a 2x slowdown would be acceptable (maybe).
If you know about other (faster) python-SQL ORM's, or maybe BigTable-alike solutions (that already are the ORM), feel free to mention them as a comment.
EDIT: Also tried this with Peewee, which resulted in ~15 s.
from peewee import *
import datetime;
database = MySQLDatabase("foo", host="localhost", port=3306, user="root", passwd="xxx")
class Foo(Model):
id = IntegerField()
A = IntegerField()
B = IntegerField()
C = IntegerField()
class Meta:
db_table = 'Foo'
database = database
start = datetime.datetime.now()
foos = Foo.select()
cnt=0;
for i in foos: cnt=cnt+1
print "total time: ", datetime.datetime.now() - start
EDIT: As a response to Matthias I tried to do the same thing in Java with Hibernate, the result is approximately 8 to 10 seconds, not exactly fast, but a lot faster than 25 seconds. The code, starting with some classes and ending with some configuration:
package herbert.hibernateorm;
import java.util.List;
import org.hibernate.Session;
import org.hibernate.Transaction;
import org.hibernate.SessionFactory;
import org.hibernate.cfg.Configuration;
public class App {
public static void main(String[] args) throws Exception {
SessionFactory factory = new Configuration().configure().buildSessionFactory();
Session session = factory.openSession();
Transaction tx = session.beginTransaction();
long start = System.currentTimeMillis();
List foos = session.createQuery("FROM Foo").list();
System.out.println(foos.size());
System.out.printf("total time: %d
", System.currentTimeMillis() - start);
session.close();
}
}
package herbert.hibernateorm;
public class Foo {
private int id, a, b, c;
public Foo() {}
public Foo(int A, int B, int C) { this.a=A; this.b=B; this.c=C; }
public int getId() { return id; }
public void setId(int id) { this.id = id; }
public int getA() { return a; }
public void setA(int a) { this.a = a; }
public int getB() { return b; }
public void setB(int b) { this.b = b; }
public int getC() { return c; }
public void setC(int c) { this.c = c; }
}
The configuration (hibernate.cfg.xml and hibernate.hbm.xml respectively)
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE hibernate-configuration PUBLIC "-//Hibernate/Hibernate Configuration DTD 3.0//EN" "http://hibernate.sourceforge.net/hibernate-configuration-3.0.dtd">
<hibernate-configuration>
<session-factory>
<property name="hibernate.dialect">org.hibernate.dialect.MySQLDialect</property>
<property name="hibernate.connection.driver_class">com.mysql.jdbc.Driver</property>
<property name="hibernate.connection.url">jdbc:mysql://localhost:3306/foo?zeroDateTimeBehavior=convertToNull</property>
<property name="hibernate.connection.username">root</property>
<property name="hibernate.connection.password">xxx</property>
<mapping resource="hibernate.hbm.xml"/>
</session-factory>
</hibernate-configuration>
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE hibernate-mapping PUBLIC "-//Hibernate/Hibernate Mapping DTD 3.0//EN" "http://hibernate.sourceforge.net/hibernate-mapping-3.0.dtd">
<hibernate-mapping>
<class name="herbert.hibernateorm.Foo" table="Foo" catalog="foo">
<id name="id" type="int">
<column name="id" />
<generator class="assigned" />
</id>
<property name="a" type="int">
<column name="A" not-null="true" />
</property>
<property name="b" type="int">
<column name="B" not-null="true" />
</property>
<property name="c" type="int">
<column name="C" not-null="true" />
</property>
</class>
</hibernate-mapping>
And finally the pom file to run it all in maven:
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>herbert</groupId>
<artifactId>hibernateORM</artifactId>
<version>1.0-SNAPSHOT</version>
<packaging>jar</packaging>
<name>hibernateORM</name>
<url>http://maven.apache.org</url>
<repositories>
<repository>
<id>unknown-jars-temp-repo</id>
<name>A temporary repository created by NetBeans for libraries and jars it could not identify. Please replace the dependencies in this repository with correct ones and delete this repository.</name>
<url>file:${project.basedir}/lib</url>
</repository>
</repositories>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>
<dependencies>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>3.8.1</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>5.1.21</version>
</dependency>
<dependency>
<groupId>org.hibernate</groupId>
<artifactId>hibernate-core</artifactId>
<version>4.0.1.Final</version>
</dependency>
<dependency>
<groupId>org.hibernate</groupId>
<artifactId>hibernate-entitymanager</artifactId>
<version>4.0.1.Final</version>
</dependency>
<dependency>
<groupId>org.hibernate.common</groupId>
<artifactId>hibernate-commons-annotations</artifactId>
<version>4.0.1.Final</version>
</dependency>
<dependency>
<groupId>nz.ac.waikato.cms.weka</groupId>
<artifactId>weka-dev</artifactId>
<version>3.7.10</version>
</dependency>
<dependency>
<groupId>commons-configuration</groupId>
<artifactId>commons-configuration</artifactId>
<version>1.9</version>
</dependency>
<dependency>
<groupId>commons-net</groupId>
<artifactId>commons-net</artifactId>
<version>3.1</version>
<classifier>examples</classifier>
</dependency>
<dependency>
<groupId>com.google.code.gson</groupId>
<artifactId>gson</artifactId>
<version>2.2.2</version>
</dependency>
<dependency>
<groupId>maven</groupId>
<artifactId>maven-jetty-plugin</artifactId>