Programing

루프의 어느 지점에서 정수 오버플로가 정의되지 않은 동작이됩니까?

lottogame 2020. 9. 15. 19:07
반응형

루프의 어느 지점에서 정수 오버플로가 정의되지 않은 동작이됩니까?


이것은 여기에 게시 할 수없는 훨씬 더 복잡한 코드를 포함하는 내 질문을 설명하는 예입니다.

#include <stdio.h>
int main()
{
    int a = 0;
    for (int i = 0; i < 3; i++)
    {
        printf("Hello\n");
        a = a + 1000000000;
    }
}

이 프로그램 a은 세 번째 루프에서 오버플로 되기 때문에 내 플랫폼에서 정의되지 않은 동작을 포함 합니다.

전체 프로그램 이 정의되지 않은 동작을 갖도록 만들 까요 , 아니면 오버플로가 실제로 발생한 후에 만 가능 합니까? 컴파일러는 잠재적으로는 해결할 수 없습니다 a 이 정의되지 않은 전체 루프를 선언하고 그들은 모두 오버 플로우 전에 발생하더라도 printfs을 실행하는 데 방해 할 수 없도록 오버 플로우?

(태그가 지정된 C와 C ++는 다르지만 두 언어가 다르면 두 언어에 대한 답변에 관심이 있기 때문에 다릅니다.)


순전히 이론적 인 답변에 관심이 있다면 C ++ 표준은 정의되지 않은 동작을 "시간 여행"에 허용합니다.

[intro.execution]/5:잘 구성된 프로그램을 실행하는 적합한 구현은 동일한 프로그램 및 동일한 입력을 사용하여 추상 기계의 해당 인스턴스의 가능한 실행 중 하나와 동일한 관찰 가능한 동작을 생성해야합니다. 그러나 그러한 실행에 정의되지 않은 작업이 포함 된 경우이 국제 표준은 해당 입력으로 해당 프로그램을 실행하는 구현에 대한 요구 사항을 지정하지 않습니다 (첫 번째 정의되지 않은 작업 이전의 작업에 대해서도).

따라서 프로그램에 정의되지 않은 동작이 포함되어 있으면 전체 프로그램 의 동작 이 정의되지 않습니다.


먼저이 질문의 제목을 수정하겠습니다.

정의되지 않은 동작은 (특히) 실행 영역이 아닙니다.

정의되지 않은 동작은 컴파일, 링크,로드 및 실행과 같은 모든 단계에 영향을줍니다.

이를 확고히하기위한 몇 가지 예는 완전한 섹션이 없다는 점을 명심하십시오.

  • 컴파일러는 Undefined Behavior를 포함하는 코드 부분이 실행되지 않는다고 가정 할 수 있으므로 이러한 동작으로 이어지는 실행 경로가 데드 코드라고 가정합니다. 모든 C 프로그래머가 Chris Lattner 외에는 정의되지 않은 동작대해 알아야 할 사항을 참조하십시오 .
  • 링커는 약한 기호 (이름으로 인식됨)의 여러 정의가있는 경우 단일 정의 규칙 덕분에 모든 정의가 동일하다고 가정 할 수 있습니다.
  • 로더 (동적 라이브러리를 사용하는 경우)는 동일하다고 가정 할 수 있으므로 찾은 첫 번째 기호를 선택합니다. 이것은 일반적으로 (ab) LD_PRELOADUnix에서 트릭을 사용하여 호출을 가로채는 데 사용됩니다.
  • 매달린 포인터를 사용하면 실행이 실패 할 수 있습니다 (SIGSEV).

이것이 Undefined Behavior에 대해 매우 무서운 것입니다. 정확한 동작이 발생하는지 미리 예측하는 것은 거의 불가능하며,이 예측은 도구 체인, 기본 OS 등이 업데이트 될 때마다 다시 검토되어야합니다.


Michael Spencer (LLVM 개발자) : CppCon 2016 : My Little Optimizer : Undefined Behavior is Magic 이 비디오를 시청하는 것이 좋습니다 .


16 비트 int대상으로하는 적극적으로 최적화 된 C 또는 C ++ 컴파일러 유형 에 추가 할 때 의 동작 정의되지 않음알게 됩니다 .1000000000int

이 원하는 무엇이든 할 하나의 표준으로 허용 할 수 떠나, 전체 프로그램의 삭제를 포함를 int main(){}.

그러나 더 큰 ints는 어떻습니까? 이 작업을 수행하는 컴파일러는 아직 모릅니다 (그리고 저는 C 및 C ++ 컴파일러 설계 전문가가 아닙니다).하지만 때때로 32 비트 int이상을 대상으로하는 컴파일러 가 루프를 알아낼 것이라고 생각합니다. 무한 ( i변경되지 않음) 너무 a것이다 결국 오버 플로우. 따라서 다시 한 번 출력을 int main(){}. 제가 여기서 말하고자하는 요점은 컴파일러 최적화가 점차 공격적으로 진행됨에 따라 점점 더 정의되지 않은 동작 구조가 예상치 못한 방식으로 나타나고 있다는 것입니다.

루프가 무한하다는 사실은 루프 본문의 표준 출력에 쓰고 있기 때문에 그 자체로 정의되지 않은 것은 아닙니다.


Technically, under the C++ standard, if a program contains undefined behavior, the behavior of the entire program, even at compile time (before the program is even executed), is undefined.

In practice, because the compiler may assume (as part of an optimization) that the overflow will not occur, at least the behavior of the program on the third iteration of the loop (assuming a 32-bit machine) will be undefined, though it is likely that you will get correct results before the third iteration. However, since the behavior of the entire program is technically undefined, there's nothing stopping the program from generating completely incorrect output (including no output), crashing at runtime at any point during execution, or even failing to compile altogether (as undefined behavior extends to compile time).

Undefined behavior provides the compiler with more room to optimize because they eliminate certain assumptions about what the code must do. In doing so, programs that rely on assumptions involving undefined behavior are not guaranteed to work as expected. As such, you should not rely on any particular behavior that is considered undefined per the C++ standard.


To understand why undefined behavior can 'time travel' as @TartanLlama adequately put it, let's take a look at the 'as-if' rule:

1.9 Program execution

1 The semantic descriptions in this International Standard define a parameterized nondeterministic abstract machine. This International Standard places no requirement on the structure of conforming implementations. In particular, they need not copy or emulate the structure of the abstract machine. Rather, conforming implementations are required to emulate (only) the observable behavior of the abstract machine as explained below.

With this, we could view the program as a 'black box' with an input and an output. The input could be user-input, files, and many other things. The output is the 'observable behavior' mentioned in the standard.

The standard only defines a mapping between the input and the output, nothing else. It does this by describing an 'example black box', but explicitly says any other black box with the same mapping is equally valid. This means the content of the black box is irrelevant.

With this in mind, it would not make sense to say that undefined behavior occurs at a certain moment. In the sample implementation of the black box, we could say where and when it happens, but the actual black box could be something completely different, so we can't say where and when it happens anymore. Theoretically, a compiler could for example decide to enumerate all the possible inputs, and pre-compute the resulting outputs. Then the undefined behavior would have happened during compilation.

Undefined behavior is the inexistence of a mapping between input and output. A program can have undefined behavior for some input, but defined behavior for other. Then the mapping between input and output is simply incomplete; there is input for which no mapping to output exists.
The program in the question has undefined behavior for any input, so the mapping is empty.


Assuming int is 32-bit, undefined behavior happens at the third iteration. So if, for example, the loop was only conditionally reachable, or could conditionally be terminated before the third iteration, there would be no undefined behavior unless the third iteration is actually reached. However, in the event of undefined behavior, all output of the program is undefined, including output which is "in the past" relative to the invocation of undefined behavior. For example, in your case, this means there is no guarantee of seeing 3 "Hello" messages in the output.


TartanLlama's answer is correct. The undefined behavior can happen at any time, even during compile time. This may seem absurd, but it's a key feature to permit compilers to do what they need to do. It's not always easy to be a compiler. You have to do exactly what the spec says, every time. However, sometimes it can be monstrously difficult to prove that a particular behavior is occurring. If you remember the halting problem, its rather trivial to develop software for which you cannot prove whether it completes or enters an infinite loop when fed a particular input.

We could make compilers be pessimistic, and constantly compile in fear that the next instruction might be one of these halting problem like issues, but that isn't reasonable. Instead we give the compiler a pass: on these "undefined behavior" topics, they are freed from any responsibility. Undefined behavior consists of all of the behaviors which are so subtly nefarious that we have trouble separating them from the really-nasty-nefarious halting problems and whatnot.

There is an example which I love to post, though I admit I lost the source to, so I have to paraphrase. It was from a particular version of MySQL. In MySQL, they had a circular buffer which was filled with user-provided data. They, of course, wanted to make sure that the data didn't overflow the buffer, so they had a check:

if (currentPtr + numberOfNewChars > endOfBufferPtr) { doOverflowLogic(); }

It looks sane enough. However, what if numberOfNewChars is really big, and overflows? Then it wraps around and becomes a pointer smaller than endOfBufferPtr, so the overflow logic would never get called. So they added a second check, before that one:

if (currentPtr + numberOfNewChars < currentPtr) { detectWrapAround(); }

It looks like you took care of the buffer overflow error, right? However, a bug was submitted stating that this buffer overflowed on a particular version of Debian! Careful investigation showed that this version of Debian was the first to use a particularly bleeding-edge version of gcc. On this version of gcc, the compiler recognized that currentPtr + numberOfNewChars can never be a smaller pointer than currentPtr because overflow for pointers is undefined behavior! That was sufficient for gcc to optimize out the entire check, and suddenly you were not protected against buffer overflows even though you wrote the code to check it!

This was spec behavior. Everything was legal (though from what I heard, gcc rolled back this change in the next version). It's not what I would consider intuitive behavior, but if you stretch your imagination a bit, it's easy to see how a slight variant of this situation could become a halting problem for the compiler. Because of this, the spec writers made it "Undefined Behavior" and stated that the compiler could do absolutely anything it pleased.


Beyond the theoretical answers, a practical observation would be that for a long time compilers have applied various transforms upon loops to reduce the amount of work done within them. For example, given:

for (int i=0; i<n; i++)
  foo[i] = i*scale;

a compiler might transform that into:

int temp = 0;
for (int i=0; i<n; i++)
{
  foo[i] = temp;
  temp+=scale;
}

Thus saving a multiplication with every loop iteration. An additional form of optimization, which compilers adapted with varying degrees of aggressiveness, would turn that into:

if (n > 0)
{
  int temp1 = n*scale;
  int *temp2 = foo;
  do
  {
    temp1 -= scale;
    *temp2++ = temp1;
  } while(temp1);
}

Even on machines with silent wraparound on overflow, that could malfunction if there was some number less than n which, when multiplied by scale, would yield 0. It could also turn into an endless loop if scale was read from memory more than once and something changed its value unexpectedly (in any case where "scale" could change mid-loop without invoking UB, a compiler would not be allowed to perform the optimization).

While most such optimizations would not have any trouble in cases where two short unsigned types are multiplied to yield a value which is between INT_MAX+1 and UINT_MAX, gcc has some cases where such a multiplication within a loop may cause the loop to early-exit. I haven't noticed such behaviors stemming from comparison instructions in generated code, but it is observable in cases where the compiler uses the overflow to infer that a loop can execute at most 4 or fewer times; it does not by default generate warnings in cases where some inputs would cause UB and others would not, even if its inferences cause the upper bound of the loop to be ignored.


Undefined behavior is, by definition, a grey area. You simply can't predict what it will or won't do -- that's what "undefined behavior" means.

Since time immemorial, programmers have always tried to salvage remnants of definedness from an undefined situation. They've got some code they really want to use, but which turns out to be undefined, so they try to argue: "I know it's undefined, but surely it will, at worst, do this or this; it will never do that." And sometimes these arguments are more or less right -- but often, they're wrong. And as the compilers get smarter and smarter (or, some people might say, sneakier and sneakier), the boundaries of the question keep changing.

So really, if you want to write code that's guaranteed to work, and that will keep working for a long time, there's only one choice: avoid ye the undefined behavior at all costs. Verily, if you dabble in it, it will come back to haunt you.


One thing your example doesn't consider is optimisation. a is set in the loop but never used, and an optimiser could work this out. As such, it is legitimate for the optimiser to discard a completely, and in that case all undefined behaviour vanishes like a boojum's victim.

However of course this itself is undefined, because optimisation is undefined. :)


Since this question is dual tagged C and C++ I will try and address both. C and C++ take different approaches here.

In C the implementation must be able to prove the undefined behavior will be invoked in order to treat the whole program as-if it had undefined behavior. In the OPs example it would seem trivial for the compiler to prove that and therefore it is as-if the whole program was undefined.

We can see this from Defect Report 109 which at its crux asks:

If however the C Standard recognizes the separate existence of "undefined values" (whose mere creation does not involve wholly "undefined behavior") then a person doing compiler testing could write a test case such as the following, and he/she could also expect (or possibly demand) that a conforming implementation should, at the very least, compile this code (and possibly also allow it to execute) without "failure."

int array1[5];
int array2[5];
int *p1 = &array1[0];
int *p2 = &array2[0];

int foo()
{
int i;
i = (p1 > p2); /* Must this be "successfully translated"? */
1/0; /* Must this be "successfully translated"? */
return 0;
}

So the bottom line question is this: Must the above code be "successfully translated" (whatever that means)? (See the footnote attached to subclause 5.1.1.3.)

and the response was:

The C Standard uses the term "indeterminately valued" not "undefined value." Use of an indeterminate valued object results in undefined behavior. The footnote to subclause 5.1.1.3 points out that an implementation is free to produce any number of diagnostics as long as a valid program is still correctly translated. If an expression whose evaulation would result in undefined behavior appears in a context where a constant expression is required, the containing program is not strictly conforming. Furthermore, if every possible execution of a given program would result in undefined behavior, the given program is not strictly conforming. A conforming implementation must not fail to translate a strictly conforming program simply because some possible execution of that program would result in undefined behavior. Because foo might never be called, the example given must be successfully translated by a conforming implementation.

In C++ the approach seems more relaxed and would suggest a program has undefined behavior regardless of whether the implementation can prove it statically or not.

We have [intro.abstrac]p5 which says:

A conforming implementation executing a well-formed program shall produce the same observable behavior as one of the possible executions of the corresponding instance of the abstract machine with the same program and the same input. However, if any such execution contains an undefined operation, this document places no requirement on the implementation executing that program with that input (not even with regard to operations preceding the first undefined operation).


The top answer is a wrong (but common) misconception:

Undefined behavior is a run-time property*. It CANNOT "time-travel"!

Certain operations are defined (by the standard) to have side-effects and cannot be optimized away. Operations that do I/O or that access volatile variables fall in this category.

However, there is a caveat: UB can be any behavior, including behavior that undoes previous operations. This can have similar consequences, in some cases, to optimizing out earlier code.

In fact, this is consistent with the quote in the top answer (emphasis mine):

A conforming implementation executing a well-formed program shall produce the same observable behavior as one of the possible executions of the corresponding instance of the abstract machine with the same program and the same input.
However, if any such execution contains an undefined operation, this International Standard places no requirement on the implementation executing that program with that input (not even with regard to operations preceding the first undefined operation).

Yes, this quote does say "not even with regard to operations preceding the first undefined operation", but notice that this is specifically about code that is being executed, not merely compiled.
After all, undefined behavior that isn't actually reached doesn't do anything, and for the line containing UB to be actually reached, code that precedes it must execute first!

So yes, once UB is executed, any effects of previous operations become undefined. But until that happens, the execution of the program is well-defined.

Note, however, that all executions of the program that result in this happening can be optimized to equivalent programs, including any that perform previous operations but then un-do their effects. Consequently, preceding code may be optimized away whenever doing so would be equivalent to their effects being undone; otherwise, it can't. See below for an example.

*Note: This is not inconsistent with UB occurring at compile time. If the compiler can indeed prove that UB code will always be executed for all inputs, then UB can extend to compile time. However, this requires knowing that all previous code eventually returns, which is a strong requirement. Again, see below for an example/explanation.


To make this concrete, note that the following code must print foo and wait for your input regardless of any undefined behavior that follows it:

printf("foo");
getchar();
*(char*)1 = 1;

However, also note that there is no guarantee that foo will remain on the screen after the UB occurs, or that the character you typed will no longer be in the input buffer; both of these operations can be "undone", which has a similar effect to UB "time-travel".

If the getchar() line wasn't there, it would be legal for the lines to be optimized away if and only if that would be indistinguishable from outputting foo and then "un-doing" it.

Whether or not the two would be indistinguishable would depend entirely on the implementation (i.e. on your compiler and standard library). For example, can your printf block your thread here while waiting for another program to read the output? Or will it return immediately?

  • If it can block here, then another program can refuse to read its full output, and it may never return, and consequently UB may never actually occur.

  • If it can return immediately here, then we know it must return, and therefore optimizing it out is entirely indistinguishable from executing it and then un-doing its effects.

Of course, since the compiler knows what behavior is permissible for its particular version of printf, it can optimize accordingly, and consequently printf may get optimized out in some cases and not others. But, again, the justification is that this would be indistinguishable from the UB un-doing previous operations, not that the previous code is "poisoned" because of UB.

참고URL : https://stackoverflow.com/questions/39914788/at-what-point-in-the-loop-does-integer-overflow-become-undefined-behavior

반응형