Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
129 views
in Technique[技术] by (71.8m points)

How do I design a database to store properties, selecting attributes by synonyms

I am designing a database for a real estate application. It is proving to be more involved than I had anticipated (maybe I am overcomplicating things).

The problems essentially are due to the presence of:

  • synonyms For example the terms: flat, apartment and penthouse may all refer to essentially, the same type of property
  • attributes (different property types have different attributes) For example an apartment could be a ground floor or top floor etc

I have ended up with a rather (unintentionally) elaborate classification tree for different property types. The tree nodes are the actual instances of property types.

I want to create a database so that I can query using not only any of the synonyms, but also the attributes.

So for example, the query (in pseudo SQL):

SELECT * from properties where synonym="flat" and attribute IN ('ground floor', 'garden');

should return a list of apartments|flats that are either ground floor AND have a garden.

Can someone help me with how to design the database schema so as to allow the kind of querying described above?

Last but not the least, I will be using either MySQl or PostgreSQL as the backend database, but would prefer the approach to be db agnostic - if possible.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I would take a different approach to your attribution scheme. Rather than treating different attributions as synonyms, I would treat them as overlapping, or more specifically, nested descriptions of a property. This would handle your business case while at the same time acknowledging the astute observation made by Mike Sherrill.

Here is a quick ERD sketch:

ERD

By way of a very fast data dictionary:

PROPERTY is a piece of real estate.

CATEGORY is a collection of descriptive attributes. The point of this table is more as an organizer of attributes than anything else. It could include things like "type of property", "ownership structure", "number of bathrooms", and whatever else might be of interest.

ATTRIBUTE is a specific quality of interest. Note the involuted relationship on this entity type. I'll deal more with that later. The main point is that attributes can be more general or more specific and some attributes can be seen as refinements of other attributes.

DESCRIPTOR is the intersection of a PROPERTY and the ATTRIBUTEs that have been associated with that particular piece of real estate.

So how is this supposed to help?

The key is how attributes work. If you use a nested set model, then you can address more or less specific attribution and searching criteria. Consider the following diagram of one potential CATEGORY with its associated ATTRIBUTEs:

enter image description here

In this example the CATEGORY is "type of property". You can see from the diagram that there is a hierarchical breakdown of attributes in this category. Each box in the diagram is a record in ATTRIBUTE. Boxes that contain other boxes have child attributes. Boxes that are inside another box have an FK to their containing box and so forth.

In this way, you could say "I want to find a property that is a Penthouse". You can then find PROPERTY records with a related DESCRIPTOR that points at the "Penthouse" ATTRIBUTE. That is pretty easy. But what if your search comes up empty?

The advantage of this approach is that you can then loosen your criteria by saying, "let's go up the attribution hierarchy to the next less-specific thing than penthouse". In my example, that would be "Highrise". Now you try your search again and you might have better luck.

A system like this gives you the ability to be as specific as you want in each category of attribution while relaxing the others far enough to start getting search hits. This is really what a real estate agent's work is about is it not? Helping the client to make the necessary compromises to find the best fit to their most important criteria?

Handling Nested Sets

The only tricky part of this approach is how to handle the nested sets. There are lots of ways to do this, many of which have been thoroughly documented elsewhere. I myself like the visitation number technique, especially for relatively static data sets. This makes it very easy to find matches for some given ATTRIBUTE or any of its children without having to do anything exotic in your SQL.

EDIT: So How Does This Work?

OP asked how do you handle things like number of bedrooms and what do the queries look like? Let's take another example for illustration:

Bedroom Example

The above shows the nested sets for the CATEGORY "Number of Bedrooms". I've also added the visitation numbers to the diagram. Note the way the visitation numbers work, in particular, note that the left (green) and right (red) numbers for any given attribute value contain the left and right visitation numbers for any subordinate attributes. For example, "2+ Bedrooms" has left and right numbers 6 and 15 respectively. Every attribute that falls under "2+ Bedrooms" has left and right numbers that fall within this range.

So how would you query for a properties with a given descriptor? Let's say we want to find all properties with two or more bedrooms. The SQL for such a query might look something like this:

select P.* 
from PROPERTY P
  inner join DESCRIPTOR D
    on P.id = D.property_id
  inner join ATTRIBUTE A
    on D.attribute_id = A.id
where A.left >= (select X.left from ATTRIBUTE X
                 where X.name = '2+ Bedrooms')
  and A.right <= (select Y.right from ATTRIBUTE Y
                  where Y.name = '2+ Bedrooms')

Note that the above query is a little different that what you might actually use. For example, you'd probably look up the filtering attribute using its int identity key instead of its string name. However, I thought I'd leave it as shown for clarity around the main point, which is you filter by looking not for a specific related attribute, but for any related attributes that fall within your filter range.

If you wanted to filter on multiple attributes, then just add more sub-clauses to your where clause.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...