Programing

C ++ 코드에서`int`를 사용해야하는 이유가 여전히 있습니까?

lottogame 2020. 5. 18. 08:03
반응형

C ++ 코드에서`int`를 사용해야하는 이유가 여전히 있습니까? [복제]


구글과 같은 많은 스타일 가이드는 예 int를 들어 배열을 인덱싱 할 때 기본 정수로 사용 하는 것이 좋습니다 . 대부분의 경우 64 비트 플랫폼이 증가함에 int따라 플랫폼의 기본 너비가 아닌 32 비트 만 있습니다. 결과적으로, 나는 단순한 선택을 제외하고 그 선택을 유지할 이유가 없다. 다음 코드를 컴파일하는 곳을 분명히 알 수 있습니다.

double get(const double* p, int k) {
  return p[k];
}

이것은 컴파일됩니다

movslq %esi, %rsi
vmovsd (%rdi,%rsi,8), %xmm0
ret

여기서 첫 번째 명령어는 32 비트 정수를 64 비트 정수로 승격시킵니다.

코드가

double get(const double* p, std::ptrdiff_t k) {
  return p[k];
}

생성 된 어셈블리는 이제

vmovsd (%rdi,%rsi,8), %xmm0
ret

CPU가 집 std::ptrdiff_t보다 집에서 더 많이 느끼는 것을 분명히 보여줍니다 int. 많은 C ++ 사용자가로 이동 std::size_t했지만 실제로 모듈로 2^n동작 이 필요하지 않으면 부호없는 정수를 사용하고 싶지 않습니다 .

대부분의 경우 int정의되지 않은 동작 또는 부호있는 정수 오버플로를 사용하면 컴파일러 가 인덱스를 처리 int하는 std::ptrdiff_tin 루프 를 내부적으로 승격시킬 수 있기 때문에 성능이 저하되지 않지만 위에서는 컴파일러가 집에서 느끼지 않는다는 것을 분명히 알 수 있습니다 int. 또한 std::ptrdiff_t64 비트 플랫폼에서 사용하면 요즘 실제로 일반적인 int것보다 큰 정수를 처리해야 할 때 더 많은 사람들이 오버플로에 갇히는 것을 볼 때 오버플 로가 발생할 가능성이 줄어 듭니다 2^31 - 1.

내가 본 바로는, 브랜드가있는 유일한 것은 int서 떨어져 같은 리터럴이 같은 사실이 될 것 5입니다 int,하지만 우리가 이동하는 경우이 문제가 발생할 수 있습니다 어디는 표시되지 않습니다 std::ptrdiff_t기본 정수로.

나는 std::ptrdiff_t작은 회사에서 작성한 모든 코드에 대한 사실상의 표준 정수로 만드는 직전에 있습니다. 그것이 나쁜 선택이 될 수있는 이유가 있습니까?

추신 : 나는 그 이름 이 못 생겼다 는 사실에 동의합니다 . 이것이 내가 조금 더 좋아 보이는 typedef'ed 이유 입니다.std::ptrdiff_til::int_t

추신 : 많은 사람들이 std::size_t기본 정수 로 사용하도록 권장한다는 것을 알고 있으므로 서명되지 않은 정수를 기본 정수로 사용하고 싶지 않다는 것을 분명히하고 싶습니다. Sjar에서 std::size_t기본 정수로 사용 하는 것은 Bjarne Stroustrup과 비디오 대화 형 패널 : 표준 : 42:38 및 1:02:50 의 표준위원회에서 인정한 실수 입니다.

추신 : 어떤 64 비트 플랫폼에서 성능면에서 내가 알고있는 것을 +, -그리고 *모두 같은 방식으로 컴파일됩니다 intstd::ptrdiff_t. 따라서 속도에는 차이가 없습니다. 컴파일 타임 상수로 나누면 속도는 같습니다. 64 비트 플랫폼에서 32 비트 정수를 사용하면 성능에 약간의 이점 a/b이 있다는 사실을 알지 못하는 경우 에만 나눌 b수 있습니다. 그러나이 사례는에서 멀어지면 선택으로 볼 수 없으므로 매우 드 rare니다 std::ptrdiff_t. 우리가 벡터화 된 코드를 다룰 때, 여기에는 분명한 차이가 있으며, 작을수록 좋습니다. 그러나 그것은 다른 이야기이며, 고집 할 이유가 없습니다 int. 이 경우 고정 크기 유형의 C ++로 이동하는 것이 좋습니다.


C ++ 핵심 가이드 라인에 대한 논의가있었습니다.

https://github.com/isocpp/CppCoreGuidelines/pull/1115

Herb Sutter는 gsl::index(미래에 std::index) 추가 될 것이라고 썼으며 이는로 정의 될 것이다 ptrdiff_t.

hsutter 님이 2017 년 12 월 26 일에 댓글을 작성 함 •

(이 메모에 대한 의견과 피드백에 대해 많은 WG21 전문가에게 감사합니다.)

GSL에 다음 typedef 추가

namespace gsl { using index = ptrdiff_t; }

gsl::index모든 컨테이너 색인 / 아래 첨자 / 크기에 권장 됩니다.

이론적 해석

지침은 아래 첨자 / 표시에 서명 된 유형을 사용하는 것이 좋습니다. ES.100에서 ES.107을 참조하십시오. C ++은 이미 배열 첨자에 부호있는 정수를 사용합니다.

우리는 사람들에게 경고 수준이 높고 단순하고 자연스럽고 경고가없는 "새롭고 깨끗한 최신 코드"를 작성하도록 지시하고 단순 코드에 대한 "함정"각주를 작성하지 못하게하고 싶습니다.

우리가 같은 짧은 채택 할 단어가없는 경우 index에 경쟁이 치열 int하고 auto, 사람들은 여전히 사용 int하고 auto자신의 버그를 얻을. 예를 들어, 널리 사용되는 플랫폼에서 32 비트 크기의 버그를 작성 for(int i=0; i<v.size(); ++i)하거나 작동하지 않는 for(auto i=0; i<v.size(); ++i)버그를 작성 for(auto i=v.size()-1; i>=0; ++i)합니다. 나는 우리가 for(ptrdiff_t i = ...똑바로 얼굴로 가르 칠 수 있다고 생각하지 않거나 사람들이 그것을 받아 들일 것이라고 생각합니다.

포화 산술 유형이 있다면 그것을 사용할 수 있습니다. 그렇지 않으면, 최선의 선택은 ptrdiff_t거의 모든 포화 산술 부호없는 형식의 장점을 가지고있는 만 제외시켰다 ptrdiff_t여전히 만연 루프 스타일의 수 for(ptrdiff_t i=0; i<v.size(); ++i)에 / 부호 불일치 서명 개의 발광 i<v.size()(유사를 들어 i!=v.size()오늘날의 STL 컨테이너를). (미래의 STL이 size_type을 서명하도록 변경하면,이 마지막 단점조차 없어집니다.)

그러나 사람들에게 일상적으로 글을 쓰도록 가르치는 것은 희망이없고 창피 할 것 for (ptrdiff_t i = ... ; ... ; ...)입니다. (현재이 가이드 라인은 한 곳에서만 사용하고 있으며 이는 색인과 관련이없는 "나쁜"예입니다.)

따라서 우리는 gsl::index(나중에 고려하기 위해 제안 할 수있는 std::index)을에 대한 typedef 제공해야 ptrdiff_t하므로, 사람들에게 일상적으로 글을 쓰도록 희망적으로 (그리고 부끄럽지 않게) 가르 칠 수 있습니다 (index i = ... ; ... ; ...).

왜 사람들에게 글을 쓰라고 말하지 ptrdiff_t않습니까? C ++에서해야 할 일을 사람들에게 알리는 것이 창피 할 것이라고 믿기 때문에 사람들이 그렇게하지 않더라도 말입니다. 글쓰기 ptrdiff_t에 비해 너무 추하고 unadoptable입니다 autoint. 이름을 추가하는 요점은 index올바른 크기의 부호있는 유형을 사용하는 것이 쉽고 쉽고 매력적이라는 것입니다.

편집 : Herb Sutter의 추가 이론적 근거

ptrdiff_t충분히 큰 가요 ? 예. ptrdiff_t두 개의 반복자를 빼면 difference_type에 맞아야하므로 표준 컨테이너에는로 나타낼 수있는 것보다 많은 요소가 없어야 합니다.

But is ptrdiff_t really big enough, if I have a built-in array of char or byte that is bigger than half the size of the memory address space and so has more elements than can be represented in a ptrdiff_t? Yes. C++ already uses signed integers for array subscripts. So use index as the default option for the vast majority of uses including all built-in arrays. (If you do encounter the extremely rare case of an array, or array-like type, that is bigger than half the address space and whose elements are sizeof(1), and you're careful about avoiding truncation issues, go ahead and use a size_t for indexes into that very special container only. Such beasts are very rare in practice, and when they do arise often won't be indexed directly by user code. For example, they typically arise in a memory manager that takes over system allocation and parcels out individual smaller allocations that its users use, or in an MPEG or similar which provides its own interface; in both cases the size_t should only be needed internally within the memory manager or the MPEG class implementation.)


I come at this from the perspective of an old timer (pre C++)... It was understood back in the day that int was the native word of the platform and was likely to give the best performance.

If you needed something bigger, then you'd use it and pay the price in performance. If you needed something smaller (limited memory, or specific need for a fixed size), same thing.. otherwise use int. And yeah, if your value was in the range where int on one target platform could accommodate it and int on another target platform could not.. then we had our compile time size specific defines (prior to them becoming standardized we made our own).

But now, present day, processors and compilers are much more sophisticated and these rules don't apply so easily. It is also harder to predict what the performance impact of your choice will be on some unknown future platform or compiler ... How do we really know that uint64_t for example will perform better or worse than uint32_t on any particular future target? Unless you're a processor/compiler guru, you don't...

So... maybe it's old fashioned, but unless I am writing code for a constrained environment like Arduino, etc. I still use int for general purpose values that I know will be within int size on all reasonable targets for the application I am writing. And the compiler takes it from there... These days that generally means 32 bits signed. Even if one assumes that 16 bits is the minimum integer size, it covers most use cases.. and the use cases for numbers larger than that are easily identified and handled with appropriate types.


Most programs do not live and die on the edge of a few CPU cycles, and int is very easy to write. However, if you are performance-sensitive, I suggest using the fixed-width integer types defined in <cstdint>, such as int32_t or uint64_t. These have the benefit of being very clear in their intended behavior in regards to being signed or unsigned, as well as their size in memory. This header also includes the fast variants such as int_fast32_t, which are at least the stated size, but might be more, if it helps performance.


No formal reason to use int. It doesn't correspond to anything sane as per standard. For indices you almost always want signed pointer-sized integer.

That said, typing int feels like you just said hey to Ritchie and typing std::ptrdiff_t feels like Stroustrup just kicked you in the butt. Coders are people too, don't bring too much ugliness into their life. I would prefer to use long or some easily typed typedef like index instead of std::ptrdiff_t.


This is somewhat opinion-based, but alas, the question somewhat begs for it, too.

First of all, you talk about integers and indices as if they were the same thing, which is not the case. For any such thing as "integer of sorts, not sure what size", simply using int is of course, most of the time, still appropriate. This works fine most of the time, for most applications, and the compiler is comfortable with it. As a default, that's fine.

For array indices, it's a different story.

There is to date one single formally correct thing, and that's std::size_t. In the future, there may be a std::index_t which makes the intent clearer on the source level, but so far there is not.
std::ptrdiff_t as an index "works" but is just as incorrect as int since it allows for negative indices.
Yes, this happens what Mr. Sutter deems correct, but I beg to differ. Yes, on an assembly language instruction level, this is supported just fine, but I still object. The standard says:

8.3.4/6: E1[E2] is identical to *((E1)+(E2)) [...] Because of the conversion rules that apply to +, if E1 is an array and E2 an integer, then E1[E2] refers to the E2-th member of E1.
5.7/5: [...] If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object [...] otherwise, the behavior is undefined.

An array subscription refers to the E2-th member of E1. There is no such thing as a negative-th element of an array. But more importantly, the pointer arithmetic with a negative additive expression invokes undefined behavior.

In other words: signed indices of whatever size are a wrong choice. Indices are unsigned. Yes, signed indices work, but they're still wrong.

Now, although size_t is by definition the correct choice (an unsigned integer type that is large enough to contain the size of any object), it may be debatable whether it is truly good choice for the average case, or as a default.

Be honest, when was the last time you created an array with 1019 elements?

I am personally using unsigned int as a default because the 4 billion elements that this allows for is way enough for (almost) every application, and it already pushes the average user's computer rather close to its limit (if merely subscribing an array of integers, that assumes 16GB of contiguous memory allocated). I personally deem defaulting to 64-bit indices as ridiculous.

If you are programming a relational database or a filesystem, then yes, you will need 64-bit indices. But for the average "normal" program, 32-bit indices are just good enough, and they only consume half as much storage.

When keeping around considerably more than a handful of indices, and if I can afford (because arrays are not larger than 64k elements), I even go down to uint16_t. No, I'm not joking there.

Is storage really such a problem? It's ridiculous to greed about two or four bytes saved, isn't it! Well, no...

Size can be a problem for pointers, so sure enough it can be for indices as well. The x32 ABI does not exist for no reason. You will not notice the overhead of needlessly large indices if you have only a handful of them in total (just like pointers, they will be in registers anyway, nobody will notice whether they're 4 or 8 bytes in size).

But think for example of a slot map where you store an index for every element (depending on the implementation, two indices per element). Oh heck, it sure does make a bummer of a difference whether you hit L2 every time, or whether you have a cache miss on every access! Bigger is not always better.

At the end of the day, you must ask yourself what you pay for, and what you get in return. With that in mind, my style recommendation would be:

If it costs you "nothing" because you only have e.g. one pointer and a few indices to keep around, then just use what's formally correct (that'd be size_t). Formally correct is good, correct always works, it's readable and intellegible, and correct is... never wrong.

If, however, it does cost you (you have maybe several hundred or thousand or ten thousand indices), and what you get back is worth nothing (because e.g. you cannot even store 220 elements, so whether you could subscribe 232 or 264 makes no difference), you should think twice about being too wasteful.


On most modern 64-bit architectures, int is 4 bytes and ptrdiff_t is 8 bytes. If your program uses a lot of integers, using ptrdiff_t instead of int could double your program's memory requirement.

Also consider that modern CPUs are frequently bottlenecked by memory performance. Using 8-byte integers also means your CPU cache now has half as many elements as before, so now it must wait for the slow main memory more often (which can easily take several hundred cycles).

In many cases, the cost of executing "32-to-64-bit conversion" operations is completely dwarfed by memory performance.

So this is a practical reason int is still popular on 64-bit machines.

  • Now you may argue about two dozen different integer types and portability and standard committees and everything, but the truth is that for a lot of C++ programs written out there, there's a "canonical" architecture they're thinking of, which is frequently the only architecture they're ever concerned about. (If you're writing a 3D graphics routine for a Windows game, you're sure it won't run on an IBM mainframe.) So for them, the question boils down to: "Do I need a 4-byte integer or an 8-byte one here?"

My advice to you is not to look at assembly language output too much, not to worry too much about exactly what size each variable is, and not to say things like "the compiler feels at home with". (I truly don't know what you mean by that last one.)

For garden-variety integers, the ones that most programs are full of, plain int is supposed to be a good type to use. It's supposed to be the natural word size of the machine. It's supposed to be efficient to use, neither wasting unnecessary memory nor inducing lots of extra conversions when moving between memory and computation registers.

Now, it's true that there are plenty of more specialized uses for which plain int is no longer appropriate. In particular, sizes of objects, counts of elements, and indices into arrays are almost always size_t. But that doesn't mean all integers should be size_t!

It's also true that mixtures of signed and unsigned types, and mixtures of different-size types, can cause problems. But most of those are well taken care of by modern compilers and the warnings they emit for unsafe combinations. So as long as you're using a modern compiler and paying attention to its warnings, you don't need to pick an unnatural type just to try to avoid type mismatch problems.


I don't think that there's real reason for using int.

How to choose the integer type?

  • If it is for bit operations, you can use an unsigned type, otherwise use a signed one
  • If it is for memory-related thing (index, container size, etc.), for which you don't know the upper bound, use std::ptrdiff_t (the only problem is when size is larger than PTRDIFF_MAX, which is rare in practice)
  • Otherwise use intXX_t or int(_least)/(_fast)XX_t.

These rules cover all the possible usages for int, and they give a better solution:

  • int is not good for storing memory related things, as its range can be smaller than an index can be (this is not a theoretical thing: for 64-bit machines, int is usually 32-bit, so with int, you can only handle 2 billion elements)
  • int is not good for storing "general" integers, as its range may be smaller than needed (undefined behavior happens if range is not enough), or on the contrary, its range may be much larger than needed (so memory is wasted)

The only reason one could use an int, if one does a calculation, and knows that the range fit into [-32767;32767] (the standard only guarantees this range. Note however, that implementations are free to provide bigger sized ints, and they usually do so. Currently int is 32-bit on a lot of platforms).

As the mentioned std types are a little bit tedious to write, one could typedef them to be shorter (I use s8/u8/.../s64/u64, and spt/upt ("(un)signed pointer sized type") for ptrdiff_t/size_t. I've been using these typedefs for 15 years, and I've never written a single int since...).


Pro

Easier to type, I guess? But you can always typedef.

Many APIs use int, including parts of the standard library. This has historically caused problems, for example during the transition to 64-bit file sizes.

Because of the default type promotion rules, types narrower than int could be widened to int or unsigned int unless you add explicit casts in a lot of places, and a lot of different types could be narrower than int on some implementation somewhere. So, if you care about portability, it’s a minor headache.

Con

I also use ptrdiff_t for indices, most of the time. (I agree with Google that unsigned indices are a bug attractor.) For other kinds of math, there’s int_fast64_t. int_fast32_t, and so on, which will also be as good as or better than int. Almost no real-world systems, with the exception of a few defunct Unices from last century, use ILP64, but there are plenty of CPUs where you would want 64-bit math. And a compiler is technically allowed, by standard, to break your program if your int is greater than 32,767.

That said, any C compiler worth its salt will be tested on a lot of code that adds an int to a pointer within an inner loop. So it can’t do anything too dumb. Worst-case scenario on present-day hardware is that it needs an extra instruction to sign-extend a 32-bit signed value to 64 bits. But, if what you really want is the fastest pointer math, the fastest math for values with magnitude between 32 kibi and 2 gibi, or the least wasted memoey, you should say what you mean, not make the compiler guess.


I guess 99% of cases there is no reason to use int(or signed integer of other sizes). However, there are still situations, when using int is a good option.


A) Performance:

One difference between int and size_t is that i++ can be undefined behavior for int - if i is MAX_INT. This actually might be a good thing because compiler could use this undefined behavior to speed things up.

For example in this question the difference was about factor 2 between exploiting the undefined behavior and using compiler flag -fwrapv which prohibits this exploit.

If my working-horse-for-loop becomes twice as fast by using ints - sure I will use it


B) Less error prone code

Reversed for-loops with size_t look strange and is a source for errors (I hope I got it right):

for(size_t i = N-1; i < N; i--){...}

By using

for(int i = N-1; i >= 0; i--){...}

you will deserve the gratitude of less experienced C++-programmers, who will have to manage your code some day.


C) Design using signed indices

By using int as indices you one could signal wrong values/out of range with negative values, something that comes handy and can lead to a clearer code.

  1. "find index of an element in array" could return -1 if element is not present. For detecting this "error" you don't have to know the size of the array.

  2. binary search could return positive index if element is in the array, and -index for the position where the element would be inserted into array (and is not in the array).

Clearly, the same information could be encoded with positive index-values, but the code becomes somewhat less intuitive.


Clearly, there are also reasons to choose int over std::ptrdiff_t - one of them is memory bandwidth. There are a lot of memory-bound algorithms, for them it is important to reduce the amount of memory transfered from RAM to cache.

If you know, that all numbers are less then 2^31 that would be an advantage to use int because otherwise a half of memory transfer would be writing only 0 of which you already know, that they are there.

An example are compressed sparse row (crs) matrices - their indices are stored as ints and not long long. Because many operations with sparse matrices are memory bound, there is really a different between using 32 or 64 bits.

참고URL : https://stackoverflow.com/questions/48729384/is-there-still-a-reason-to-use-int-in-c-code

반응형