Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.0k views
in Technique[技术] by (71.8m points)

node.js - AWS Lambda - MySQL caching

I have Lambda that uses RDS. I wanted to improve it and use the Lambda connection caching. I have found several articles, and implemented it on my side, best to my knowledge. But now, I am not sure it is this the rigth way to go.

I have Lambda (running Node 8), which has several files used with require. I will start from the main function, until I reach the MySQL initializer, which is exact path. All will be super simple, showing only to flow of the code that runs MySQL:

Main Lambda:

const jobLoader = require('./Helpers/JobLoader');

exports.handler = async (event, context) => {
    const emarsysPayload = event.Records[0];
    let validationSchema;

    const body = jobLoader.loadJob('JobName');
     ...
    return;
...//

Job Code:

const MySQLQueryBuilder = require('../Helpers/MySqlQueryBuilder');

exports.runJob = async (params) => {
      const data = await MySQLQueryBuilder.getBasicUserData(userId);

MySQLBuilder:

const mySqlConnector = require('../Storage/MySqlConnector');

    class MySqlQueryBuilder {
        async getBasicUserData (id) {
            let query = `
    SELECT * from sometable WHERE id= ${id} 
    `;

            return mySqlConnector.runQuery(query);
        }
    }

And Finally the connector itself:

const mySqlConnector = require('promise-mysql');
const pool = mySqlConnector.createPool({
        host: process.env.MY_SQL_HOST,
        user: process.env.MY_SQL_USER,
        password: process.env.MY_SQL_PASSWORD,
        database: process.env.MY_SQL_DATABASE,
        port: 3306
    });

    exports.runQuery = async query => {
        const con = await pool.getConnection();
        const result = con.query(query);
        con.release();
        return result;
    };

I know that measuring performance will show the actual results, but today is Friday, and I will not be able to run this on Lambda until the late next week... And really, it would be awesome start of the weekend knowing I am in right direction... or not.

Thank for the inputs.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

First thing would be to understand how require works in NodeJS. I do recommend you go through this article if you're interested in knowing more about it.

Now, once you have required your connection, you have it for good and it won't be required again. This matches what you're looking for as you don't want to overwhelm your database by creating a new connection every time.

But, there is a problem...

Lambda Cold Starts

Whenever you invoke a Lambda function for the first time, it will spin up a container with your function inside it and keep it alive for approximately 5 mins. It's very likely (although not guaranteed) that you will hit the same container every time as long as you are making 1 request at a time. But what happens if you have 2 requests at the same time? Then another container will be spun up in parallel with the previous, already warmed up container. You have just created another connection on your database and now you have 2 containers. Now, guess what happens if you have 3 concurrent requests? Yes! One more container, which equals one more DB connection.

As long as there are new requests to your Lambda functions, by default, they will scale out to meet demand (you can configure it in the console to limit the execution to as many concurrent executions as you want - respecting your Account limits)

You cannot safely make sure you have a fixed amount of connections to your Database by simply requiring your code upon a Function's invocation. The good thing is that this is not your fault. This is just how Lambda functions behave.

...one other approach is

to cache the data you want in a real caching system, like ElasticCache, for example. You could then have one Lambda function be triggered by a CloudWatch Event that runs in a certain frequency of time. This function would then query your DB and store the results in your external cache. This way you make sure your DB connection is only opened by one Lambda at a time, because it will respect the CloudWatch Event, which turns out to run only once per trigger.

EDIT: after the OP sent a link in the comment sections, I have decided to add a few more info to clarify what the mentioned article wants to say

From the article:

"Simple. You ARE able to store variables outside the scope of our handler function. This means that you are able to create your DB connection pool outside of the handler function, which can then be shared with each future invocation of that function. This allows for pooling to occur."

And this is exactly what you're doing. And this works! But the problem is if you have N connections (Lambda Requests) at the same time. If you don't set any limits, by default, up to 1000 Lambda functions can be spun up concurrently. Now, if you then make another 1000 requests simultaneously in the next 5 minutes, it's very likely you won't be opening any new connections, because they have already been opened on previous invocations and the containers are still alive.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...