Programing

힙 대 이진 검색 트리 (BST)

lottogame 2020. 6. 7. 00:43
반응형

힙 대 이진 검색 트리 (BST)


힙과 BST의 차이점은 무엇입니까?

힙 사용시기와 BST 사용시기

정렬 된 방식으로 요소를 가져 오려면 BST가 힙보다 낫습니까?


요약

          Type      BST (*)   Heap
Insert    average   log(n)    1
Insert    worst     log(n)    log(n) or n (***)
Find any  worst     log(n)    n
Find max  worst     1 (**)    1
Create    worst     n log(n)  n
Delete    worst     log(n)    log(n)

이 표의 모든 평균 시간은 삽입을 제외하고 최악의 시간과 동일합니다.

  • *:이 답변의 모든 곳에서 BST == 균형이 잡힌 BST는 불균형이 무증상이기 때문에
  • **:이 답변에 설명 된 사소한 수정 사용
  • ***: log(n)포인터 트리 힙, n동적 배열 힙

BST에 비해 이진 힙의 장점

  • 이진 힙으로의 평균 삽입 시간 O(1)은 BST입니다 O(log(n)). 이것이 힙의 킬러 기능입니다.

    피보나치 힙O(1) 과 같은 상각 (더 강한)에 도달하는 다른 힙도 있지만 Brodal 큐 와 같은 최악의 경우도 있습니다. 비록 비 점프 성능으로 인해 실용적이지는 않을 수도 있습니다. 피보나치 힙 또는 브로 달 큐는 실제로 어디에서 사용됩니까?

  • 이진 힙은 동적 배열 또는 포인터 기반 트리 (BST 만 포인터 기반 트리) 위에 효율적으로 구현 될 수 있습니다 . 따라서 힙의 경우 가끔 크기 조정 대기 시간을 줄 수있는 경우보다 공간 효율적인 배열 구현을 선택할 수 있습니다.

  • 이진 힙 생성 이다 O(n)최악의 경우 , O(n log(n))BST합니다.

이진 힙에 비해 BST의 장점

  • 임의의 요소 검색은 O(log(n))입니다. 이것이 BST의 킬러 기능입니다.

    힙의 O(n)경우 가장 큰 요소 인를 제외하고 일반적으로입니다 O(1).

BST에 비해 힙의 "거짓"장점

  • 힙은 O(1)max, BST를 찾는 것 O(log(n))입니다.

    가장 큰 요소를 추적하기 위해 BST를 수정하고 해당 요소가 변경 될 수있을 때마다 업데이트하는 것이 쉽지 않기 때문에 이것은 일반적인 오해입니다. 큰 스왑을 삽입 할 때 제거 할 때 두 번째로 큰 것을 찾으십시오. 바이너리 검색 트리를 사용하여 힙 작업을 시뮬레이션 할 수 있습니까? (한 여진에 의해 ).

    실제로 이것은 BST와 비교할 때 힙 한계 입니다. 유일한 효율적인 검색은 가장 큰 요소에 대한 것입니다.

평균 이진 힙 삽입은 O(1)

출처 :

직관적 인 주장 :

  • 최하위 트리 레벨은 최상위 레벨보다 기하 급수적으로 더 많은 요소를 가지므로 새로운 요소는 거의 최하단에 갈 것입니다
  • 힙 삽입 은 하단 에서 시작하고 BST는 상단에서 시작해야합니다.

이진 힙에서 주어진 인덱스에서 값을 증가시키는 것도 O(1)같은 이유입니다. 그러나 그렇게하려면 힙 작업에 대한 추가 인덱스를 최신 상태로 유지하고 싶을 것 입니다. 최소 힙 기반 우선 순위 대기열에 대해 O (로그온) 감소 키 작업을 구현하는 방법은 무엇입니까? 예를 들어 Dijkstra. 추가 비용없이 가능합니다.

실제 하드웨어에서 GCC C ++ 표준 라이브러리 삽입 벤치 마크

삽입 시간에 대해 올바른지 확인하기 위해 C ++ std::set( Red-black tree BST ) 및 std::priority_queue( dynamic array heap ) 삽입을 벤치마킹했으며 이것이 내가 얻은 것입니다.

여기에 이미지 설명을 입력하십시오

  • 벤치 마크 코드
  • 줄거리 스크립트
  • 플롯 데이터
  • CPU가 장착 된 Lenovo ThinkPad P51 랩탑에서 Ubuntu 19.04, GCC 8.3.0 테스트 : Intel Core i7-7820HQ CPU (4 코어 / 8 스레드, 2.90GHz 기본, 8MB 캐시), RAM : Samsung M471A2K43BB1-CRC (2x 16GiB) , 2400Mbps), SSD : Samsung MZVLB512HAJQ-000L7 (512GB, 3,000MB / s)

그래서 분명히 :

  • 힙 삽입 시간은 기본적으로 일정합니다.

    동적 배열 크기 조정 지점을 명확하게 볼 수 있습니다. 모든 시스템 노이즈에서 무엇이든 볼 수 있도록 10k 삽입마다 평균을 계산하기 때문에 이러한 피크는 실제로 표시된 것보다 약 10k 배 더 큽니다!

    확대 된 그래프는 본질적으로 어레이 크기 조정 포인트 만 제외하고 거의 모든 인서트가 25 나노초 미만임을 나타냅니다.

  • BST는 로그입니다. 모든 인서트는 평균 힙 인서트보다 훨씬 느립니다.

  • BST 대 hashmap 자세한 분석 : C ++의 std :: map 안에 어떤 데이터 구조가 있습니까?

gem5의 GCC C ++ 표준 라이브러리 삽입 벤치 마크

gem5 는 전체 시스템 시뮬레이터이므로로 정확한 무한 시계를 제공합니다 m5 dumpstats. 그래서 개별 인서트의 타이밍을 추정하는 데 사용하려고했습니다.

여기에 이미지 설명을 입력하십시오

해석:

  • 힙은 여전히 ​​일정하지만 이제 몇 줄이 있고 더 높은 줄이 더 희박하다는 것을 더 자세히 알 수 있습니다.

    이는 더 높은 삽입을 위해 수행되는 메모리 액세스 대기 시간과 일치해야합니다.

  • TODO I can't really interpret the BST fully one as it does not look so logarithmic and somewhat more constant.

    With this greater detail however we can see can also see a few distinct lines, but I'm not sure what they represent: I would expect the bottom line to be thinner, since we insert top bottom?

Benchmarked with this Buildroot setup on an aarch64 HPI CPU.

BST cannot be efficiently implemented on an array

Heap operations only need to bubble up or down a single tree branch, so O(log(n)) worst case swaps, O(1) average.

Keeping a BST balanced requires tree rotations, which can change the top element for another one, and would require moving the entire array around (O(n)).

Heaps can be efficiently implemented on an array

Parent and children indexes can be computed from the current index as shown here.

There are no balancing operations like BST.

Delete min is the most worrying operation as it has to be top down. But it can always be done by "percolating down" a single branch of the heap as explained here. This leads to an O(log(n)) worst case, since the heap is always well balanced.

If you are inserting a single node for every one you remove, then you lose the advantage of the asymptotic O(1) average insert that heaps provide as the delete would dominate, and you might as well use a BST. Dijkstra however updates nodes several times for each removal, so we are fine.

Dynamic array heaps vs pointer tree heaps

Heaps can be efficiently implemented on top of pointer heaps: Is it possible to make efficient pointer-based binary heap implementations?

The dynamic array implementation is more space efficient. Suppose that each heap element contains just a pointer to a struct:

  • the tree implementation must store three pointers for each element: parent, left child and right child. So the memory usage is always 4n (3 tree pointers + 1 struct pointer).

    Tree BSTs would also need further balancing information, e.g. black-red-ness.

  • the dynamic array implementation can be of size 2n just after a doubling. So on average it is going to be 1.5n.

On the other hand, the tree heap has better worst case insert, because copying the backing dynamic array to double its size takes O(n) worst case, while the tree heap just does new small allocations for each node.

Still, the backing array doubling is O(1) amortized, so it comes down to a maximum latency consideration. Mentioned here.

Philosophy

  • BSTs maintain a global property between a parent and all descendants (left smaller, right bigger).

    The top node of a BST is the middle element, which requires global knowledge to maintain (knowing how many smaller and larger elements are there).

    This global property is more expensive to maintain (log n insert), but gives more powerful searches (log n search).

  • Heaps maintain a local property between parent and direct children (parent > children).

    The top node of a heap is the big element, which only requires local knowledge to maintain (knowing your parent).

Doubly-linked list

A doubly linked list can be seen as subset of the heap where first item has greatest priority, so let's compare them here as well:

  • insertion:
    • position:
      • doubly linked list: the inserted item must be either the first or last, as we only have pointers to those elements.
      • binary heap: the inserted item can end up in any position. Less restrictive than linked list.
    • time:
      • doubly linked list: O(1) worst case since we have pointers to the items, and the update is really simple
      • binary heap: O(1) average, thus worse than linked list. Tradeoff for having more general insertion position.
  • search: O(n) for both

An use case for this is when the key of the heap is the current timestamp: in that case, new entries will always go to the beginning of the list. So we can even forget the exact timestamp altogether, and just keep the position in the list as the priority.

This can be used to implement an LRU cache. Just like for heap applications like Dijkstra, you will want to keep an additional hashmap from the key to the corresponding node of the list, to find which node to update quickly.

Comparison of different Balanced BST

Although the asymptotic insert and find times for all data structures that are commonly classified as "Balanced BSTs" that I've seen so far is the same, different BBSTs do have different trade-offs. I haven't fully studied this yet, but it would be good to summarize these trade-offs here:

  • Red-black tree. Appears to be the most commonly used BBST as of 2019, e.g. it is the one used by the GCC 8.3.0 C++ implementation
  • AVL tree. Appears to be a bit more balanced than BST, so it could be better for find latency, at the cost of slightly more expensive finds. Wiki summarizes: "AVL trees are often compared with red–black trees because both support the same set of operations and take [the same] time for the basic operations. For lookup-intensive applications, AVL trees are faster than red–black trees because they are more strictly balanced. Similar to red–black trees, AVL trees are height-balanced. Both are, in general, neither weight-balanced nor mu-balanced for any mu < 1/2; that is, sibling nodes can have hugely differing numbers of descendants."
  • WAVL. The original paper mentions advantages of that version in terms of bounds on rebalancing and rotation operations.

See also

Similar question on CS: https://cs.stackexchange.com/questions/27860/whats-the-difference-between-a-binary-search-tree-and-a-binary-heap


Heap just guarantees that elements on higher levels are greater (for max-heap) or smaller (for min-heap) than elements on lower levels, whereas BST guarantees order (from "left" to "right"). If you want sorted elements, go with BST.


When to use a heap and when to use a BST

Heap is better at findMin/findMax (O(1)), while BST is good at all finds (O(logN)). Insert is O(logN) for both structures. If you only care about findMin/findMax (e.g. priority-related), go with heap. If you want everything sorted, go with BST.

First few slides from here explain things very clearly.


As mentioned by others, Heap can do findMin or findMax in O(1) but not both in the same data structure. However I disagree that Heap is better in findMin/findMax. In fact, with a slight modification, the BST can do both findMin and findMax in O(1).

In this modified BST, you keep track of the the min node and max node everytime you do an operation that can potentially modify the data structure. For example in insert operation you can check if the min value is larger than the newly inserted value, then assign the min value to the newly added node. The same technique can be applied on the max value. Hence, this BST contain these information which you can retrieve them in O(1). (same as binary heap)

In this BST (Balanced BST), when you pop min or pop max, the next min value to be assigned is the successor of the min node, whereas the next max value to be assigned is the predecessor of the max node. Thus it perform in O(1). However we need to re-balance the tree, thus it will still run O(log n). (same as binary heap)

I would be interested to hear your thought in the comment below. Thanks :)

Update

Cross reference to similar question Can we use binary search tree to simulate heap operation? for more discussion on simulating Heap using BST.


A binary search tree uses the definition: that for every node,the node to the left of it has a less value(key) and the node to the right of it has a greater value(key).

Where as the heap,being an implementation of a binary tree uses the following definition:

If A and B are nodes, where B is the child node of A,then the value(key) of A must be larger than or equal to the value(key) of B.That is, key(A) ≥ key(B).

http://wiki.answers.com/Q/Difference_between_binary_search_tree_and_heap_tree

I ran in the same question today for my exam and I got it right. smile ... :)


Another use of BST over Heap; because of an important difference :

  • finding successor and predecessor in a BST will take O(h) time. ( O(logn) in balanced BST)
  • while in Heap, would take O(n) time to find successor or predecessor of some element.

Use of BST over a Heap: Now, Lets say we use a data structure to store landing time of flights. We cannot schedule a flight to land if difference in landing times is less than 'd'. And assume many flights have been scheduled to land in a data structure(BST or Heap).

Now, we want to schedule another Flight which will land at t. Hence, we need to calculate difference of t with its successor and predecessor (should be >d). Thus, we will need a BST for this, which does it fast i.e. in O(logn) if balanced.

EDITed:

Sorting BST takes O(n) time to print elements in sorted order (Inorder traversal), while Heap can do it in O(n logn) time. Heap extracts min element and re-heapifies the array, which makes it do the sorting in O(n logn) time.


Insert all n elements from an array to BST takes O(n logn). n elemnts in an array can be inserted to a heap in O(n) time. Which gives heap a definite advantage


Heap just guarantees that elements on higher levels are greater (for max-heap) or smaller (for min-heap) than elements on lower levels

I love the above answer and putting my comment just more specific to my need and usage. I had to get the n locations list find the distance from each location to specific point say (0,0) and then return the a m locations having smaller distance. I used Priority Queue which is Heap. For finding distances and putting in heap it took me n(log(n)) n-locations log(n) each insertion. Then for getting m with shortest distances it took m(log(n)) m-locations log(n) deletions of heaping up.

BST 로이 작업을 수행 해야하는 경우 최악의 경우 n (n)을 삽입해야합니다. (첫 번째 값이 매우 작고 다른 모든 값이 순차적으로 길고 길어지고 나무가 오른쪽 자식 또는 왼쪽 자식으로 확장됩니다 작고 작은 경우 최소값은 O (1) 시간이 걸렸지 만 다시 균형을 잡아야했기 때문에 내 상황과 위의 모든 대답에서 당신은 최소 또는 최대 우선 순위 기준으로 값을 얻은 후에야 힙을 위해.

참고 URL : https://stackoverflow.com/questions/6147242/heap-vs-binary-search-tree-bst

반응형