I have master - slave replication setup for MySQL db backend for Django.
Currently I am reading and writing for Master DB only, but my dashboards are quite query intensive.
I was searching for an option, where in I can Define like following
DATABASES
DATABASES = {
'default_slave': {
'ENGINE': 'django.db.backends.mysql',
'NAME': 'application',
'USER': 'root',
'PASSWORD': '',
'HOST': '',
'PORT': '3306',
},
'default': {
'ENGINE': 'django.db.backends.mysql',
'NAME': 'application',
'USER': 'root',
'PASSWORD': '',
'HOST': '',
'PORT': '3306',
},
}
for dashboards, reports and various other apps, what I want to do is:
Try connection : default_slave : using default_slave
if reachable else using default
That is, if slave is up, fetch the reports from slave database itself, if not fetch the reports from master database.
Catch is, slave can be up or down, and i want this to be dynamically selectable regarding which database to use for fetching reports, based on rechability.
Is this possible ? Can I test connection before hand and move forward ?
With this I would write and sync_db in Master, and always read from Slave, if slave is up.
Need some solution/hint for raw queries
as well as orm queries
The router concept seems nice, but fallback on slave not reachable, i don't know the possibility.
UPDATE
How to go about MULTI-DATABASE
DATABASES
DATABASES = {
'default_slave': {
'ENGINE': 'django.db.backends.mysql',
'NAME': 'application',
'USER': 'root',
'PASSWORD': '',
'HOST': '',
'PORT': '3306',
},
'default': {
'ENGINE': 'django.db.backends.mysql',
'NAME': 'application',
'USER': 'root',
'PASSWORD': '',
'HOST': '',
'PORT': '3306',
},
}
'linux': {
'ENGINE': 'django.db.backends.mysql',
'NAME': 'application',
'USER': 'root',
'PASSWORD': '',
'HOST': '',
'PORT': '3306',
},
}
'linux_slave': {
'ENGINE': 'django.db.backends.mysql',
'NAME': 'application',
'USER': 'root',
'PASSWORD': '',
'HOST': '',
'PORT': '3306',
},
}
'mac': {
'ENGINE': 'django.db.backends.mysql',
'NAME': 'application',
'USER': 'root',
'PASSWORD': '',
'HOST': '',
'PORT': '3306',
},
}
'mac_slave': {
'ENGINE': 'django.db.backends.mysql',
'NAME': 'application',
'USER': 'root',
'PASSWORD': '',
'HOST': '',
'PORT': '3306',
},
}
'pc': {
'ENGINE': 'django.db.backends.mysql',
'NAME': 'application',
'USER': 'root',
'PASSWORD': '',
'HOST': '',
'PORT': '3306',
},
}
'pc_slave': {
'ENGINE': 'django.db.backends.mysql',
'NAME': 'application',
'USER': 'root',
'PASSWORD': '',
'HOST': '',
'PORT': '3306',
},
}
Now, I have
1. Static Data
2. Dynamic Data
Static Data has to be stored in 'default' which will be replicated to 'default_slave'
For dynamic data, the query first needs to judge where might the Dynamic Data be lying : in 'mac' or in 'pc' or in 'linux'
To Achieve that, I added one field in 'static table' : 'query_on' which contains either ['mac' or 'linux' or 'pc']
Now, whicl using query set, I am simply writing
static = Static.objects.get(pk = 1)
query_on = static.query_on
dynamic = Dynamic.objects.get(static = static).using(alias=query_on)
This works well, the query routes to the database it needs to get executed, here I need to judge :
- If
<'query_on'>_slave
: connection is up : use : <'query_on'>_slave
OR
- If
<'query_on'>_slave
: connection is down : use : <'query_on'>
How to go about that ?
Further details for the application:
- There is one database : default (configuration & analytics database) : for maintaining the configuration data and report analytics data
- There are 20 databases (raw databases) : as example says : mac, linux, rhel, windows, pc .... (example name) : for collecting the raw data, which is not processes for analytics
- each database has one or multiple slaves, naming convention would be : default_slave_0, default_slave_1, default_slave_2 and so for other databases as well
Now the analytics data needs to be first queried per 5 minutes, 30 minutes, 1 hour .... and that query needs to be sent out to specific database, because not every database will be carrying specific dataset required for analytics.
To do that, we need to
- get the configuration data from (default or any one of its slave (slave part is the question) )
- once we have the configuration, we can easily look into where the "raw" data might be
- query for raw data, and gather results and analyse --> store it in "default" database.
Now all the 30 (raw) and 1 default database, would require "sync", as we maintain the same data abse structure throughout all the nodes.
Now, since we are looking at CPU spikes on all the databases, it makes sense to use "slave" databases to query for "raw" data.
hence the requirement of using
. I am not able to imagine how routers would be of help here ?
See Question&Answers more detail:
os