Programing

CouchDB 문서 모델링 원칙

lottogame 2020. 7. 20. 21:01
반응형

CouchDB 문서 모델링 원칙


한동안 답변을 시도했지만 알 수없는 질문이 있습니다.

CouchDB 문서를 어떻게 디자인 또는 분할합니까?

예를 들어 블로그 게시물을 보자.

이를 수행하는 반 "관계형"방법은 몇 가지 객체를 만드는 것입니다.

  • 게시하다
  • 사용자
  • 논평
  • 꼬리표
  • 단편

이것은 큰 의미가 있습니다. 그러나 나는 같은 것을 모델링하기 위해 couchdb를 사용하려고 노력하고 있으며 (매우 훌륭한 이유 때문에) 매우 어려웠습니다.

블로그 게시물의 대부분은이 작업을 수행하는 방법에 대한 간단한 예를 제공합니다. 기본적으로 같은 방식으로 나누지 만 각 문서에 '임의'속성을 추가 할 수 있다고 말하면 분명히 좋습니다. 따라서 CouchDB에는 다음과 같은 것이 있습니다.

  • 게시 (문서에 태그 및 스 니펫 '의사'모델 포함)
  • 논평
  • 사용자

어떤 사람들은 당신이 거기에 댓글과 사용자를 던질 수 있다고 말할 수도 있습니다.


post {
    id: 123412804910820
    title: "My Post"
    body: "Lots of Content"
    html: "<p>Lots of Content</p>"
    author: {
        name: "Lance"
        age: "23"
    }
    tags: ["sample", "post"]
    comments {
        comment {
            id: 93930414809
            body: "Interesting Post"
        } 
        comment {
            id: 19018301989
            body: "I agree"
        }
    }
}

그것은 매우 멋져 보이고 이해하기 쉽습니다. 또한 모든 게시물 문서에서 주석 만 추출한보기를 작성하여 사용자 및 태그와 동일한 주석 모델로 만드는 방법을 이해합니다.

그러나 나는 왜 전체 사이트를 하나의 문서에 넣지 않는가?


site {
    domain: "www.blog.com"
    owner: "me"
    pages {
        page {
            title: "Blog"
            posts {
                post {
                    id: 123412804910820
                    title: "My Post"
                    body: "Lots of Content"
                    html: "<p>Lots of Content</p>"
                    author: {
                        name: "Lance"
                        age: "23"
                    }
                    tags: ["sample", "post"]
                    comments {
                        comment {
                            id: 93930414809
                            body: "Interesting Post"
                        } 
                        comment {
                            id: 19018301989
                            body: "I agree"
                        }
                    }
                }
                post {
                    id: 18091890192984
                    title: "Second Post"
                    ...
                }
            }
        }
    }
}

원하는 것을 찾기 위해 쉽게 뷰를 만들 수 있습니다.

그렇다면 내가 가진 질문은 문서를 작은 문서로 나눌 때 또는 문서간에 "관계"를 만드는시기를 어떻게 결정 하는가입니다.

나는 그것이 훨씬 더 "객체 지향적"일 것이라고 생각하고, 만약 그렇게 나뉘어지면 Value Objects에 쉽게 매핑 할 수 있습니다.


posts {
    post {
        id: 123412804910820
        title: "My Post"
        body: "Lots of Content"
        html: "<p>Lots of Content</p>"
        author_id: "Lance1231"
        tags: ["sample", "post"]
    }
}
authors {
    author {
        id: "Lance1231"
        name: "Lance"
        age: "23"
    }
}
comments {
    comment {
        id: "comment1"
        body: "Interesting Post"
        post_id: 123412804910820
    } 
    comment {
        id: "comment2"
        body: "I agree"
        post_id: 123412804910820
    }
}

... but then it starts looking more like a Relational Database. And often times I inherit something that looks like the "whole-site-in-a-document", so it's more difficult to model it with relations.

I've read lots of things about how/when to use Relational Databases vs. Document Databases, so that's not the main issue here. I'm more just wondering, what's a good rule/principle to apply when modeling data in CouchDB.

Another example is with XML files/data. Some XML data has nesting 10+ levels deep, and I would like to visualize that using the same client (Ajax on Rails for instance, or Flex) that I would to render JSON from ActiveRecord, CouchRest, or any other Object Relational Mapper. Sometimes I get huge XML files that are the entire site structure, like the one below, and I'd need to map it to Value Objects to use in my Rails app so I don't have to write another way of serializing/deserializing data:


<pages>
    <page>
        <subPages>
            <subPage>
                <images>
                    <image>
                        <url/>
                    </image>
                </images>
            </subPage>
        </subPages>
    </page>
</pages>

So the general CouchDB questions are:

  1. What rules/principles do you use to divide up your documents (relationships, etc)?
  2. Is it okay to put the entire site into one document?
  3. If so, how do you handle serializing/deserializing documents with arbitrary depths levels (like the large json example above, or the xml example)?
  4. Or do you not turn them into VOs, do you just decide "these ones are too nested to Object-Relational Map, so I'll just access them using raw XML/JSON methods"?

Thanks a lot for your help, the issue of how to divide up your data with CouchDB has been difficult for me to say "this is how I should do it from now on". I hope to get there soon.

I have studied the following sites/projects.

  1. Hierarchical Data in CouchDB
  2. CouchDB Wiki
  3. Sofa - CouchDB App
  4. CouchDB The Definitive Guide
  5. PeepCode CouchDB Screencast
  6. CouchRest
  7. CouchDB README

...but they still haven't answered this question.


There have been some great answers to this already, but I wanted to add some more recent CouchDB features to the mix of options for working with the original situation described by viatropos.

The key point at which to split up documents is where there might be conflicts (as mentioned earlier). You should never keep massively "tangled" documents together in a single document as you'll get a single revision path for completely unrelated updates (comment addition adding a revision to the entire site document for instance). Managing the relationships or connections between various, smaller documents can be confusing at first, but CouchDB provides several options for combining disparate pieces into single responses.

The first big one is view collation. When you emit key/value pairs into the results of a map/reduce query, the keys are sorted based on UTF-8 collation ("a" comes before "b"). You can also output complex keys from your map/reduce as JSON arrays: ["a", "b", "c"]. Doing that would allow you to include a "tree" of sorts built out of array keys. Using your example above, we can output the post_id, then the type of thing we're referencing, then its ID (if needed). If we then output the id of the referenced document into an object in the value that's returned we can use the 'include_docs' query param to include those documents in the map/reduce output:

{"rows":[
  {"key":["123412804910820", "post"], "value":null},
  {"key":["123412804910820", "author", "Lance1231"], "value":{"_id":"Lance1231"}},
  {"key":["123412804910820", "comment", "comment1"], "value":{"_id":"comment1"}},
  {"key":["123412804910820", "comment", "comment2"], "value":{"_id":"comment2"}}
]}

'? include_docs = true'를 사용하여 동일한보기를 요청하면 'value'오브젝트에서 참조 된 '_id'를 사용하거나 'value'오브젝트에없는 경우 'doc'키가 추가됩니다. 행이 발행 된 문서의 '_id'(이 경우 'post'문서) 이 결과에는 이미 터가 작성된 소스 문서를 참조하는 'id'필드가 포함됩니다. 공간과 가독성을 위해 남겨 두었습니다.

그런 다음 'start_key'및 'end_key'매개 변수를 사용하여 결과를 단일 게시물의 데이터로 필터링 할 수 있습니다.

? start_key = [ "123412804910820"] & end_key = [ "123412804910820", {}, {}]
또는 특정 유형의 목록을 구체적으로 추출하십시오.
? start_key = [ "123412804910820", "주석"] & end_key = [ "123412804910820", "주석", {}]
이 쿼리 매개 변수 조합은 빈 개체 ( " {}")가 항상 데이터 정렬의 맨 아래에 있고 null 또는 ""이 항상 맨 위에 있기 때문에 가능합니다.

The second helpful addition from CouchDB in these situations is the _list function. This would allow you to run the above results through a templating system of some kind (if you want HTML, XML, CSV or whatever back), or output a unified JSON structure if you want to be able to request an entire post's content (including author and comment data) with a single request and returned as a single JSON document that matches what your client-side/UI code needs. Doing that would allow you to request the post's unified output document this way:

/db/_design/app/_list/posts/unified??start_key=["123412804910820"]&end_key=["123412804910820", {}, {}]&include_docs=true
_list 함수 (이 경우 "unified")는 뷰 맵 / 리 듀스 (이 경우 "posts")의 결과를 가져 와서 컨텐츠 유형으로 HTTP 응답을 다시 보내는 JavaScript 함수를 통해 실행합니다. 필요 (JSON, HTML 등).

이러한 것들을 결합하여 업데이트, 충돌 및 복제에 유용하고 "안전한"수준으로 문서를 분할 한 다음 필요할 때 다시 정리할 수 있습니다.

희망이 도움이됩니다.


은 내가 올바르게 기억한다면 문서가 업데이트되는 빈도를 염두에두고 "다치게"될 때까지 비정규 화한다고 말합니다.

  1. 문서 (관계 등)를 나누기 위해 어떤 규칙 / 원칙을 사용합니까?

As a rule of thumb, I include all data that is needed to display a page regarding the item in question. In other words, everything you would print on a real-world piece of paper that you would hand to somebody. E.g. a stock quote document would include the name of the company, the exchange, the currency, in addition to the numbers; a contract document would include the names and addresses of the counterparties, all information on dates and signatories. But stock quotes from distinct dates would form separate documents, separate contracts would form separate documents.

  1. Is it okay to put the entire site into one document?

No, that would be silly, because:

  • you would have to read and write the whole site (the document) on each update, and that is very inefficient;
  • you would not benefit from any view caching.

I know this is an old question, but I came across it trying to figure out the best approach to this exact same problem. Christopher Lenz wrote a nice blog post about methods of modeling "joins" in CouchDB. One of my take-aways was: "The only way to allow non-conflicting addition of related data is by putting that related data into separate documents." So, for simplicity sake you'd want to lean towards "denormalization". But you'll hit a natural barrier due to conflicting writes in certain circumstances.

In your example of Posts and Comments, if a single post and all of its comments lived in one document, then two people trying to post a comment at the same time (i.e. against the same revision of the document) would cause a conflict. This would get even worse in your "whole site in a single document" scenario.

So I think the rule of thumb would be "denormalize until it hurts", but the point where it will "hurt" is where you have a high likelihood of multiple edits being posted against the same revision of a document.


I think Jake's response nails one of the most important aspects of working with CouchDB that may help you make the scoping decision: conflicts.

In the case where you have comments as an array property of the post itself, and you just have a 'post' DB with a bunch of huge 'post' documents in it, as Jake and others correctly pointed out you could imagine a scenario on a really popular blog post where two users submit edits to the post document simultaneously, resulting in a collision and a version conflict for that document.

ASIDE: As this article points out, also consider that each time you are requesting/updating that doc you have to get/set the document in its entirety, so passing around a massive documents that either represent the entire site or a post with a lot of comments on it can become a problem you would want to avoid.

In the case where posts are modeled separately from comments and two people submit a comment on a story, those simply become two "comment" documents in that DB, with no issue of conflict; just two PUT operations to add two new comments to the "comment" db.

Then to write the views that give you back the comments for a post, you would pass in the postID and then emit all the comments that reference that parent post ID, sorted in some logical ordering. Maybe you even pass in something like [postID,byUsername] as the key to the 'comments' view to indicate the parent post and how you want the results sorted or something along those lines.

MongoDB handles documents a bit differently, allowing indexes to be built on sub-elements of a document, so you might see the same question on the MongoDB mailing list and someone saying "just make the comments a property of the parent post".

Because of the write locking and single-master nature of Mongo, the conflicting revision issue of two people adding comments wouldn't spring up there and the query-ability of the content, as mentioned, isn't effected too poorly because of sub-indexes.

That being said, if your sub-elements in either DB are going to be huge (say 10s of thousands of comments) I believe it is the recommendation of both camps to make those separate elements; I have certainly seen that to be the case with Mongo as there are some upper bound limits on how big a document and its subelements can be.

참고URL : https://stackoverflow.com/questions/1530745/principles-for-modeling-couchdb-documents

반응형