Programing

왜 SortedSet인가

lottogame 2020. 11. 19. 07:42

왜 SortedSet인가.GetViewBetween이 O (log N)가 아닌가요?

.NET 4.0 이상에서 클래스 SortedSet<T>에는 GetViewBetween(l, r)지정된 두 값 사이의 모든 값을 포함하는 트리 부분에 인터페이스 뷰를 반환하는 라는 메서드가 있습니다. 그것이 SortedSet<T>빨강-검정 트리로 구현 된다는 점을 감안할 때 자연스럽게 제 O(log N)시간 에 실행될 것으로 예상합니다 . C ++의 유사한 방법은 std::set::lower_bound/upper_bound, Java에서는 TreeSet.headSet/tailSet이며 로그입니다.

그러나 그것은 사실이 아닙니다. 다음 코드는 32 초 내에 실행되지만의 동등한 O(log N)버전은 GetViewBetween이 코드를 1-2 초 내에 실행합니다.

var s = new SortedSet<int>();
int n = 100000;
var rand = new Random(1000000007);
int sum = 0;
for (int i = 0; i < n; ++i) {
    s.Add(rand.Next());
    if (rand.Next() % 2 == 0) {
        int l = rand.Next(int.MaxValue / 2 - 10);
        int r = l + rand.Next(int.MaxValue / 2 - 10);
        var t = s.GetViewBetween(l, r);
        sum += t.Min;
    }
}
Console.WriteLine(sum);

dotPeek을 사용하여 System.dll을 디 컴파일 했으며 다음 과 같은 결과를 얻었습니다.

public TreeSubSet(SortedSet<T> Underlying, T Min, T Max, bool lowerBoundActive, bool upperBoundActive)
    : base(Underlying.Comparer)
{
    this.underlying = Underlying;
    this.min = Min;
    this.max = Max;
    this.lBoundActive = lowerBoundActive;
    this.uBoundActive = upperBoundActive;
    this.root = this.underlying.FindRange(this.min, this.max, this.lBoundActive, this.uBoundActive);
    this.count = 0;
    this.version = -1;
    this.VersionCheckImpl();
}

internal SortedSet<T>.Node FindRange(T from, T to, bool lowerBoundActive, bool upperBoundActive)
{
  SortedSet<T>.Node node = this.root;
  while (node != null)
  {
    if (lowerBoundActive && this.comparer.Compare(from, node.Item) > 0)
    {
      node = node.Right;
    }
    else
    {
      if (!upperBoundActive || this.comparer.Compare(to, node.Item) >= 0)
        return node;
      node = node.Left;
    }
  }
  return (SortedSet<T>.Node) null;
}

private void VersionCheckImpl()
{
    if (this.version == this.underlying.version)
      return;
    this.root = this.underlying.FindRange(this.min, this.max, this.lBoundActive, this.uBoundActive);
    this.version = this.underlying.version;
    this.count = 0;
    base.InOrderTreeWalk((TreeWalkPredicate<T>) (n =>
    {
      SortedSet<T>.TreeSubSet temp_31 = this;
      int temp_34 = temp_31.count + 1;
      temp_31.count = temp_34;
      return true;
    }));
}

So, FindRange is obviously O(log N), but after that we call VersionCheckImpl... which does a linear-time traversal of the found subtree only for recounting its nodes!

Why would you need to do that traversal all the time?
Why .NET does not contain a O(log N) method for splitting a tree based on key, like C++ or Java? It is really helpful in lots of situations.

about the `version` field

UPDATE1:

In my memory, a lot of(maybe all?) collections in BCL have the field version.

First,about `foreach`:

according to this msdn link

The foreach statement repeats a group of embedded statements for each element in an array or an object collection. The foreach statement is used to iterate through the collection to get the desired information, but should not be used to change the contents of the collection to avoid unpredictable side effects.

In many other collections, version is protected the data is not modified during the foreach

For example, HashTable's MoveNext():

public virtual bool MoveNext()
{
    if (this.version != this.hashtable.version)
    {
        throw new InvalidOperationException(Environment.GetResourceString("InvalidOperation_EnumFailedVersion"));
    }
    ..........
}

But in the in the SortedSet<T>'s MoveNext() method:

public bool MoveNext()
{
    this.tree.VersionCheck();
    if (this.version != this.tree.version)
    {
        ThrowHelper.ThrowInvalidOperationException(ExceptionResource.InvalidOperation_EnumFailedVersion);
    }       
    ....
}

UPDATE2：

But the O(N) loop maybe not only for version but also for the Count property.

Because the MSDN of GetViewBetween said:

This method returns a view of the range of elements that fall between lowerValue and upperValue, as defined by the comparer .... You can make changes in both the view and in the underlying SortedSet(Of T).

So for every update it should be sync the count field (key and value are already same). To make sure the Count is correct

There were two policies to reach the target:

Microsoft's
Mono's

First.MS's,in their code, they sacrifice the GetViewBetween()'s performance and win the Count Property's performance.

VersionCheckImpl() is one way to sync the Count property.

Second,Mono. In mono's code,GetViewBetween() is Faster, but in their GetCount()method:

internal override int GetCount ()
{
    int count = 0;
    using (var e = set.tree.GetSuffixEnumerator (lower)) {
        while (e.MoveNext () && set.helper.Compare (upper, e.Current) >= 0)
            ++count;
    }
    return count;
}

It is always an O(N) operation!

참고URL : https://stackoverflow.com/questions/9850975/why-sortedsett-getviewbetween-isnt-olog-n

'Programing' 카테고리의 다른 글

Spark 코드 구성 및 모범 사례 (0)	2020.11.19
HttpClient에서 압축 해제 전에 압축 된 데이터에 액세스 할 수 있습니까? (0)	2020.11.19
PostgreSQL 함수는 트랜잭션입니까? (0)	2020.11.18
Git에서 .classpath 및 .project 무시 (0)	2020.11.18
작업자 역할과 웹 작업 (0)	2020.11.18

현재글왜 SortedSet인가

lottogame 복권의 역사, 로또 정보와 IT 기술 등을 다루는 블로그입니다.

복권의 역사, 로또 정보와 IT 기술 등을 다루는 블로그입니다.

관광, Spring3, 자바, 축제, 뮤지컬, c#, java, 공연, 놀거리, 행사, JQuery, 극장순위, Javascript, spring, 무비순위, 볼거리, 연극, 여행, 가족나들이, c++,

Today :
Yesterday :

일	월	화	수	목	금	토
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

lottogame

왜 SortedSet인가

왜 SortedSet인가.GetViewBetween이 O (log N)가 아닌가요?

about the `version` field

UPDATE1:

First,about `foreach`:

UPDATE2：

'Programing' 카테고리의 다른 글

'Programing'의 다른글

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역

왜 SortedSet인가

왜 SortedSet인가.GetViewBetween이 O (log N)가 아닌가요?

about the version field

UPDATE1:

First,about foreach:

UPDATE2：

'Programing' 카테고리의 다른 글

'Programing'의 다른글

관련글

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역

about the `version` field

First,about `foreach`: