Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
93 views
in Technique[技术] by (71.8m points)

php - How to setup ElasticSearch index structure with multiple entity bindings


recently I started working on ElasticSearch (ES) implementation into legacy e-commerce app written in PHP using MySQL. I am completely new to all this stuff and reading the docs is fine, yet I really need somebody with experience to advise me.

From the ES documentation I was able to setup a new cluster and I also found out that rivers are deprecated and should be replaced, so I replaced them with Logstash and JDBC MySQL connector.

At this point I have:

  • ElasticSearch
  • Logstash
  • JDBC MySQL driver
  • MySQL server

The database structure of the application is not really optimal and is very hard to replace, but I'd like to replicate it into the ES index in the best possible way.

DB Structure:

Products

+-------------------------------+-------+--------+
|              Id               | Title | Price  |
+-------------------------------+-------+--------+
| 00c8234d71c4e94f725cd432ebc04 | Alpha | 589,00 |
| 018357657529fef056cf396626812 | Beta  | 355,00 |
| 01a2c32ceeff0fc6b7dd4fc4302ab | Gamma | 0,00   |
+-------------------------------+-------+--------+

Flags

+------------+-------------+
|     Id     |    Title    |
+------------+-------------+
| sellout    | Sellout     |
| discount   | Discount    |
| topproduct | Top Product |
+------------+-------------+

flagsProducts (n:m pivot)

+------+-------------------------------+------------+------------+
|  Id  |           ProductId           |   FlagId   | ExternalId |
+------+-------------------------------+------------+------------+
| 1552 | 00c8234d71c4e94f725cd432ebc04 | sellout    | NULL       |
| 2845 | 00c8234d71c4e94f725cd432ebc04 | topproduct | NULL       |
| 9689 | 018357657529fef056cf396626812 | discount   | NULL       |
| 4841 | 01a2c32ceeff0fc6b7dd4fc4302ab | discount   | NULL       |
+------+-------------------------------+------------+------------+

Those string IDs are a complete disaster (but I have to deal with them now). At first I thought I should do a flat structure of Products index to ES, but what about multiple entity bindings?

Question&Answers:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

That's a great start!

I would definitely flatten it all out (i.e. denormalize) and come up with product documents that look like the one below. That way you get rid of the N:M relationship between products and flags by simply creating a flags array for each product. It will thus be easier to query those flags.

{
   "id": "00c8234d71c4e94f725cd432ebc04",
   "title": "Alpha",
   "price": 589.0,
   "flags": ["Sellout", "Top Product"]
}
{
   "id": "018357657529fef056cf396626812",
   "title": "Beta",
   "price": 355.0,
   "flags": ["Discount"]
}
{
   "id": "01a2c32ceeff0fc6b7dd4fc4302ab",
   "title": "Gamma",
   "price": 0.0,
   "flags": ["Discount"]
}

The product mapping type would look like this:

PUT products
{
    "mappings": {
        "product": {
            "properties": {
                "id": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "title": {
                    "type": "string"
                },
                "price": {
                    "type": "double",
                    "null_value": 0.0
                },
                "flags": {
                    "type": "string",
                    "index": "not_analyzed"
                }
            }
        }
    }
}

Since you have the logstash jdbc input already, all you're missing is the proper SQL query to fetch the products and associated flags.

  SELECT p.Id as id, p.Title as title, p.Price as price, GROUP_CONCAT(f.Title) as flags
    FROM Products p
    JOIN flagsProducts fp ON fp.ProductId = p.Id
    JOIN Flags f ON fp.FlagId = f.id
GROUP BY p.Id

Which would get you rows like these:

+-------------------------------+-------+-------+---------------------+
| id                            | title | price | flags               |
+-------------------------------+-------+-------+---------------------+
| 00c8234d71c4e94f725cd432ebc04 | Alpha |   589 | Sellout,Top product |
| 018357657529fef056cf396626812 | Beta  |   355 | Discount            |
| 01a2c32ceeff0fc6b7dd4fc4302ab | Gamma |     0 | Discount            |
+-------------------------------+-------+-------+---------------------+

Using Logstash filters you can then split the flags into an array and you're good to go.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...