Programing

Windows cmd 인코딩 변경으로 인해 Python 충돌 발생

lottogame 2020. 12. 29. 06:38
반응형

Windows cmd 인코딩 변경으로 인해 Python 충돌 발생


먼저 Windows CMD 인코딩을 utf-8로 변경하고 Python 인터프리터를 실행합니다.

chcp 65001
python

그런 다음 내부에 유니 코드 스팅을 인쇄하려고 시도 하고이 작업을 수행하면 Python이 특이한 방식으로 충돌합니다 (동일한 창에서 cmd 프롬프트가 나타납니다).

>>> import sys
>>> print u'ëèæîð'.encode(sys.stdin.encoding)

왜 발생하고 어떻게 작동하는지 아이디어가 있습니까?

UPD : sys.stdin.encoding반환'cp65001'

UPD2 :이 문제는 utf-8이 멀티 바이트 문자 집합을 사용한다는 사실과 관련이있을 수 있다는 사실을 알게되었습니다 (kcwu가 이에 대해 좋은 지적을했습니다). 나는 'windows-1250'으로 전체 예제를 실행 해 보았고 'ëeaî?'를 얻었다. Windows-1250은 단일 문자 집합을 사용하므로 이해하는 문자에 대해 작동합니다. 그러나 나는 여전히 여기서 'utf-8'을 작동시키는 방법을 모릅니다.

UPD3 : 오, 나는 그것이 알려진 파이썬 버그라는 것을 알았습니다 . 나는 파이썬이 cmd 인코딩을 'cp65001로 sys.stdin.encoding에 복사하고 모든 입력에 적용하려고 시도합니다. 'cp65001'을 이해하지 못하기 때문에 ASCII가 아닌 문자를 포함하는 모든 입력에서 충돌이 발생합니다.


cp65001변경하지 않고 UTF-8 로 별칭 을 지정 하는 방법은 다음과 같습니다 encodings\aliases.py.

import codecs
codecs.register(lambda name: codecs.lookup('utf-8') if name == 'cp65001' else None)

(IMHO, http://bugs.python.org/issue6058#msg97731cp65001 에서 UTF-8과 동일하지 않다는 어리 석음에 신경 쓰지 마십시오 . Microsoft의 코덱에 사소한 버그가 있더라도 동일하도록 의도 된 것입니다. .)

여기에 콘솔 출력 작업을한다 (타호 - LAFS, tahoe-lafs.org을 위해 작성) 일부 코드 에 관계없이chcp코드 페이지는 또한 유니 코드 명령 줄 인수를 읽습니다. 신용 마이클 카플란 이 솔루션 뒤에 아이디어. stdout 또는 stderr이 리디렉션되면 UTF-8을 출력합니다. 바이트 순서 표시를 원하면 명시 적으로 작성해야합니다.

[편집 :이 버전 은 버그가있는 MSVC 런타임 라이브러리 WriteConsoleW_O_U8TEXT플래그 대신 사용 합니다 . WriteConsoleWMS 문서에 비해 버그가 많지만 덜 그렇습니다.]

import sys
if sys.platform == "win32":
    import codecs
    from ctypes import WINFUNCTYPE, windll, POINTER, byref, c_int
    from ctypes.wintypes import BOOL, HANDLE, DWORD, LPWSTR, LPCWSTR, LPVOID

    original_stderr = sys.stderr

    # If any exception occurs in this code, we'll probably try to print it on stderr,
    # which makes for frustrating debugging if stderr is directed to our wrapper.
    # So be paranoid about catching errors and reporting them to original_stderr,
    # so that we can at least see them.
    def _complain(message):
        print >>original_stderr, message if isinstance(message, str) else repr(message)

    # Work around <http://bugs.python.org/issue6058>.
    codecs.register(lambda name: codecs.lookup('utf-8') if name == 'cp65001' else None)

    # Make Unicode console output work independently of the current code page.
    # This also fixes <http://bugs.python.org/issue1602>.
    # Credit to Michael Kaplan <http://www.siao2.com/2010/04/07/9989346.aspx>
    # and TZOmegaTZIOY
    # <http://stackoverflow.com/questions/878972/windows-cmd-encoding-change-causes-python-crash/1432462#1432462>.
    try:
        # <http://msdn.microsoft.com/en-us/library/ms683231(VS.85).aspx>
        # HANDLE WINAPI GetStdHandle(DWORD nStdHandle);
        # returns INVALID_HANDLE_VALUE, NULL, or a valid handle
        #
        # <http://msdn.microsoft.com/en-us/library/aa364960(VS.85).aspx>
        # DWORD WINAPI GetFileType(DWORD hFile);
        #
        # <http://msdn.microsoft.com/en-us/library/ms683167(VS.85).aspx>
        # BOOL WINAPI GetConsoleMode(HANDLE hConsole, LPDWORD lpMode);

        GetStdHandle = WINFUNCTYPE(HANDLE, DWORD)(("GetStdHandle", windll.kernel32))
        STD_OUTPUT_HANDLE = DWORD(-11)
        STD_ERROR_HANDLE = DWORD(-12)
        GetFileType = WINFUNCTYPE(DWORD, DWORD)(("GetFileType", windll.kernel32))
        FILE_TYPE_CHAR = 0x0002
        FILE_TYPE_REMOTE = 0x8000
        GetConsoleMode = WINFUNCTYPE(BOOL, HANDLE, POINTER(DWORD))(("GetConsoleMode", windll.kernel32))
        INVALID_HANDLE_VALUE = DWORD(-1).value

        def not_a_console(handle):
            if handle == INVALID_HANDLE_VALUE or handle is None:
                return True
            return ((GetFileType(handle) & ~FILE_TYPE_REMOTE) != FILE_TYPE_CHAR
                    or GetConsoleMode(handle, byref(DWORD())) == 0)

        old_stdout_fileno = None
        old_stderr_fileno = None
        if hasattr(sys.stdout, 'fileno'):
            old_stdout_fileno = sys.stdout.fileno()
        if hasattr(sys.stderr, 'fileno'):
            old_stderr_fileno = sys.stderr.fileno()

        STDOUT_FILENO = 1
        STDERR_FILENO = 2
        real_stdout = (old_stdout_fileno == STDOUT_FILENO)
        real_stderr = (old_stderr_fileno == STDERR_FILENO)

        if real_stdout:
            hStdout = GetStdHandle(STD_OUTPUT_HANDLE)
            if not_a_console(hStdout):
                real_stdout = False

        if real_stderr:
            hStderr = GetStdHandle(STD_ERROR_HANDLE)
            if not_a_console(hStderr):
                real_stderr = False

        if real_stdout or real_stderr:
            # BOOL WINAPI WriteConsoleW(HANDLE hOutput, LPWSTR lpBuffer, DWORD nChars,
            #                           LPDWORD lpCharsWritten, LPVOID lpReserved);

            WriteConsoleW = WINFUNCTYPE(BOOL, HANDLE, LPWSTR, DWORD, POINTER(DWORD), LPVOID)(("WriteConsoleW", windll.kernel32))

            class UnicodeOutput:
                def __init__(self, hConsole, stream, fileno, name):
                    self._hConsole = hConsole
                    self._stream = stream
                    self._fileno = fileno
                    self.closed = False
                    self.softspace = False
                    self.mode = 'w'
                    self.encoding = 'utf-8'
                    self.name = name
                    self.flush()

                def isatty(self):
                    return False

                def close(self):
                    # don't really close the handle, that would only cause problems
                    self.closed = True

                def fileno(self):
                    return self._fileno

                def flush(self):
                    if self._hConsole is None:
                        try:
                            self._stream.flush()
                        except Exception as e:
                            _complain("%s.flush: %r from %r" % (self.name, e, self._stream))
                            raise

                def write(self, text):
                    try:
                        if self._hConsole is None:
                            if isinstance(text, unicode):
                                text = text.encode('utf-8')
                            self._stream.write(text)
                        else:
                            if not isinstance(text, unicode):
                                text = str(text).decode('utf-8')
                            remaining = len(text)
                            while remaining:
                                n = DWORD(0)
                                # There is a shorter-than-documented limitation on the
                                # length of the string passed to WriteConsoleW (see
                                # <http://tahoe-lafs.org/trac/tahoe-lafs/ticket/1232>.
                                retval = WriteConsoleW(self._hConsole, text, min(remaining, 10000), byref(n), None)
                                if retval == 0 or n.value == 0:
                                    raise IOError("WriteConsoleW returned %r, n.value = %r" % (retval, n.value))
                                remaining -= n.value
                                if not remaining:
                                    break
                                text = text[n.value:]
                    except Exception as e:
                        _complain("%s.write: %r" % (self.name, e))
                        raise

                def writelines(self, lines):
                    try:
                        for line in lines:
                            self.write(line)
                    except Exception as e:
                        _complain("%s.writelines: %r" % (self.name, e))
                        raise

            if real_stdout:
                sys.stdout = UnicodeOutput(hStdout, None, STDOUT_FILENO, '<Unicode console stdout>')
            else:
                sys.stdout = UnicodeOutput(None, sys.stdout, old_stdout_fileno, '<Unicode redirected stdout>')

            if real_stderr:
                sys.stderr = UnicodeOutput(hStderr, None, STDERR_FILENO, '<Unicode console stderr>')
            else:
                sys.stderr = UnicodeOutput(None, sys.stderr, old_stderr_fileno, '<Unicode redirected stderr>')
    except Exception as e:
        _complain("exception %r while fixing up sys.stdout and sys.stderr" % (e,))


    # While we're at it, let's unmangle the command-line arguments:

    # This works around <http://bugs.python.org/issue2128>.
    GetCommandLineW = WINFUNCTYPE(LPWSTR)(("GetCommandLineW", windll.kernel32))
    CommandLineToArgvW = WINFUNCTYPE(POINTER(LPWSTR), LPCWSTR, POINTER(c_int))(("CommandLineToArgvW", windll.shell32))

    argc = c_int(0)
    argv_unicode = CommandLineToArgvW(GetCommandLineW(), byref(argc))

    argv = [argv_unicode[i].encode('utf-8') for i in xrange(0, argc.value)]

    if not hasattr(sys, 'frozen'):
        # If this is an executable produced by py2exe or bbfreeze, then it will
        # have been invoked directly. Otherwise, unicode_argv[0] is the Python
        # interpreter, so skip that.
        argv = argv[1:]

        # Also skip option arguments to the Python interpreter.
        while len(argv) > 0:
            arg = argv[0]
            if not arg.startswith(u"-") or arg == u"-":
                break
            argv = argv[1:]
            if arg == u'-m':
                # sys.argv[0] should really be the absolute path of the module source,
                # but never mind
                break
            if arg == u'-c':
                argv[0] = u'-c'
                break

    # if you like:
    sys.argv = argv

마지막으로, 이다 내가 콘솔을위한 우수한 글꼴, 동의 같은데요 산세 모노를 사용하는 ΤΖΩΤΖΙΟΥ의 소원을 부여 할 수.

'명령 창에서 글꼴을 사용할 수있는 필수 기준'Microsoft KB 에서 글꼴 요구 사항 및 Windows 콘솔 용 새 글꼴을 추가하는 방법에 대한 정보를 찾을 수 있습니다.

그러나 기본적으로 Vista (아마도 Win7)에서 :

  • 아래 HKEY_LOCAL_MACHINE_SOFTWARE\Microsoft\Windows NT\CurrentVersion\Console\TrueTypeFont설정 "0""DejaVu Sans Mono";
  • 아래의 각 하위 키에 대해로 HKEY_CURRENT_USER\Console설정 "FaceName"합니다 "DejaVu Sans Mono".

XP에서 'Changing Command Prompt fonts?' 스레드를 확인하십시오. LockerGnome 포럼에서 .


PYTHONIOENCODING 시스템 변수 설정 :

> chcp 65001
> set PYTHONIOENCODING=utf-8
> python example.py
Encoding is utf-8

소스 example.py는 간단합니다.

import sys
print "Encoding is", sys.stdin.encoding

Python이 UTF-8로 인코딩하도록 하시겠습니까?

>>>print u'ëèæîð'.encode('utf-8')
ëèæîð

Python은 cp65001을 UTF-8로 인식하지 않습니다.


I had this annoying issue, too, and I hated not being able to run my unicode-aware scripts same in MS Windows as in linux. So, I managed to come up with a workaround.

Take this script (say, uniconsole.py in your site-packages or whatever):

import sys, os

if sys.platform == "win32":
    class UniStream(object):
        __slots__= ("fileno", "softspace",)

        def __init__(self, fileobject):
            self.fileno = fileobject.fileno()
            self.softspace = False

        def write(self, text):
            os.write(self.fileno, text.encode("utf_8") if isinstance(text, unicode) else text)

    sys.stdout = UniStream(sys.stdout)
    sys.stderr = UniStream(sys.stderr)

This seems to work around the python bug (or win32 unicode console bug, whatever). Then I added in all related scripts:

try:
    import uniconsole
except ImportError:
    sys.exc_clear()  # could be just pass, of course
else:
    del uniconsole  # reduce pollution, not needed anymore

Finally, I just run my scripts as needed in a console where chcp 65001 is run and the font is Lucida Console. (How I wish that DejaVu Sans Mono could be used instead… but hacking the registry and selecting it as a console font reverts to a bitmap font.)

This is a quick-and-dirty stdout and stderr replacement, and also does not handle any raw_input related bugs (obviously, since it doesn't touch sys.stdin at all). And, by the way, I've added the cp65001 alias for utf_8 in the encodings\aliases.py file of the standard lib.


For unknown encoding: cp65001 issue, can set new Variable as PYTHONIOENCODING and Value as UTF-8. (This works for me)

View this:
View this


This is because "code page" of cmd is different to "mbcs" of system. Although you changed the "code page", python (actually, windows) still think your "mbcs" doesn't change.


For me setting this env var before execution of python program worked:

set PYTHONIOENCODING=utf-8

A few comments: you probably misspelled encodig and .code. Here is my run of your example.

C:\>chcp 65001
Active code page: 65001

C:\>\python25\python
...
>>> import sys
>>> sys.stdin.encoding
'cp65001'
>>> s=u'\u0065\u0066'
>>> s
u'ef'
>>> s.encode(sys.stdin.encoding)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
LookupError: unknown encoding: cp65001
>>>

The conclusion - cp65001 is not a known encoding for python. Try 'UTF-16' or something similar.


The problem has been solved and addressed in this thread:

Change the system encoding

The solution is to deselect the Unicode UTF-8 for worldwide support in Win. It will require a restart, upon which your Python should be back to normal.

Steps for Win:

  1. Go to Control Panel
  2. Select Clock and Region
  3. Click Region > Administrative
  4. In Language for non-Unicode programs click on the “Change system locale”
  5. In popped up window “Region Settings” untick “Beta: Use Unicode UTF-8...”
  6. Restart the machine as per the Win prompt

The picture to show exact location of how to solve the issue:

How to resolve the issue

ReferenceURL : https://stackoverflow.com/questions/878972/windows-cmd-encoding-change-causes-python-crash

반응형