Elasticsearch关联关系数据建模（一）应用层联接

从关系型数据库迁移数据到Elasticsearch时，总要处理很多关联数据，如何进行数据建模，下面给出了四种方案可供大家参考。

其中最简单的一种就是数据冗余扁平化，这个不做过多讲解。

应用层联接有点类型关系型数据库的子查询。第一次查询的结果作为第二次查询的条件。

PUT /my_index/user/1 
{
  "name":     "John Smith",
  "email":    "john@smith.com",
  "dob":      "1970/10/24"
}

PUT /my_index/blogpost/2 
{
  "title":    "Relationships",
  "body":     "It's complicated...",
  "user":     1 
}

blogpost 通过用户的 id 链接到用户。

通过用户的 ID 1 可以很容易的找到博客帖子。

GET /my_index/blogpost/_search
{
  "query": {
    "filtered": {
      "filter": {
        "term": { "user": 1 }
      }
    }
  }
}

为了找到用户叫做 John 的博客帖子，我们需要运行两次查询。先查询名字包含 John 的所有用户的 id 集合，再像上面一样根据 id 查询 blogpost。

执行第一个查询得到的结果将填充到 terms 过滤器中。

GET /my_index/user/_search
{
  "query": {
    "match": {
      "name": "John"
    }
  }
}

GET /my_index/blogpost/_search
{
  "query": {
    "filtered": {
      "filter": {
        "terms": { "user": [1, 3, 7] }  
      }
    }
  }
}

总结：应用层联接的主要优点是可以对数据进行标准化处理。缺点就是需要2次查询，有时间消耗。
如果说叫 John 的用户有很多，比如百万以上，那查询是非常没有效率的。
这种方法适合于 user 只有少量文档的情况，并且最好它们很少改变，这将允许应用程序对结果进行缓存，避免经常运行第一次查询。

eg. 搜索用户名称和博客标题，展示用户及其最相关的博客列表。

需要按用户名称进行分组，根据score进行排序选TOPN。

https://www.elastic.co/guide/cn/elasticsearch/guide/current/top-hits.html

eg. 文件目录的搜索，可以参考如下链接。

https://www.elastic.co/guide/cn/elasticsearch/guide/current/denormalization-concurrency.html

eg. 并发问题

https://www.elastic.co/guide/cn/elasticsearch/guide/current/concurrency-solutions.html