C ++에서 문자열을 어떻게 토큰 화합니까?

Programing

C ++에서 문자열을 어떻게 토큰 화합니까?

lottogame 2020. 2. 22. 11:22

C ++에서 문자열을 어떻게 토큰 화합니까?

Java에는 편리한 분할 방법이 있습니다.

String str = "The quick brown fox";
String[] results = str.split(" ");

C ++에서이를 수행하는 쉬운 방법이 있습니까?

이 std::string::find방법을 사용하여 간단한 케이스를 쉽게 만들 수 있습니다 . 그러나 Boost.Tokenizer를 살펴 보십시오 . 훌륭합니다. Boost에는 일반적으로 매우 멋진 문자열 도구가 있습니다.

부스트 토크 나이의 클래스는 매우 간단 이런 종류의 물건을 만들 수 있습니다 :

#include <iostream>
#include <string>
#include <boost/foreach.hpp>
#include <boost/tokenizer.hpp>

using namespace std;
using namespace boost;

int main(int, char**)
{
    string text = "token, test   string";

    char_separator<char> sep(", ");
    tokenizer< char_separator<char> > tokens(text, sep);
    BOOST_FOREACH (const string& t, tokens) {
        cout << t << "." << endl;
    }
}

C ++ 11 용으로 업데이트 :

#include <iostream>
#include <string>
#include <boost/tokenizer.hpp>

using namespace std;
using namespace boost;

int main(int, char**)
{
    string text = "token, test   string";

    char_separator<char> sep(", ");
    tokenizer<char_separator<char>> tokens(text, sep);
    for (const auto& t : tokens) {
        cout << t << "." << endl;
    }
}

여기 진짜 간단한 것이 있습니다 :

#include <vector>
#include <string>
using namespace std;

vector<string> split(const char *str, char c = ' ')
{
    vector<string> result;

    do
    {
        const char *begin = str;

        while(*str != c && *str)
            str++;

        result.push_back(string(begin, str));
    } while (0 != *str++);

    return result;
}

strtok을 사용하십시오. 내 의견으로는, strtok이 필요한 것을 제공하지 않으면 토큰 화와 관련된 클래스를 작성할 필요가 없습니다. C와 C ++에서 15 년 이상 다양한 파싱 코드를 작성하면서 항상 strtok을 사용했습니다. 여기에 예가 있습니다

char myString[] = "The quick brown fox";
char *p = strtok(myString, " ");
while (p) {
    printf ("Token: %s\n", p);
    p = strtok(NULL, " ");
}

몇 가지주의 사항 (필요하지 않을 수도 있음). 문자열은 프로세스에서 "파기"됩니다. 즉 EOS 문자가 delimter 지점에 인라인으로 배치됩니다. 올바르게 사용하려면 문자열이 아닌 버전을 만들어야합니다. 구문 분석 중에 구분 기호 목록을 변경할 수도 있습니다.

내 생각에, 위의 코드는 별도의 클래스를 작성하는 것보다 훨씬 간단하고 사용하기 쉽습니다. 나에게 이것은 언어가 제공하는 기능 중 하나이며 잘 작동합니다. 단순히 "C 기반"솔루션입니다. 적절하고 쉬우 며 많은 추가 코드를 작성할 필요가 없습니다 :-)

또 다른 빠른 방법은을 사용하는 것 getline입니다. 다음과 같은 것 :

stringstream ss("bla bla");
string s;

while (getline(ss, s, ' ')) {
 cout << s << endl;
}

원한다면을 split()반환하는 간단한 메소드를 만들 수 있습니다 vector<string>.

스트림, 반복자 및 복사 알고리즘을 사용하여이를 직접적으로 수행 할 수 있습니다.

#include <string>
#include <vector>
#include <iostream>
#include <istream>
#include <ostream>
#include <iterator>
#include <sstream>
#include <algorithm>

int main()
{
  std::string str = "The quick brown fox";

  // construct a stream from the string
  std::stringstream strstr(str);

  // use stream iterators to copy the stream to the vector as whitespace separated strings
  std::istream_iterator<std::string> it(strstr);
  std::istream_iterator<std::string> end;
  std::vector<std::string> results(it, end);

  // send the vector to stdout.
  std::ostream_iterator<std::string> oit(std::cout);
  std::copy(results.begin(), results.end(), oit);
}

기분 나쁘게의 사람들은,하지만 같은 간단한 문제에 대한, 당신은 물건 만들기없는 방법은 너무 복잡. Boost 를 사용해야하는 많은 이유가 있습니다 . 그러나이 간단한 것에 대해서는 20 # 썰매로 비행을하는 것과 같습니다.

void
split( vector<string> & theStringVector,  /* Altered/returned value */
       const  string  & theString,
       const  string  & theDelimiter)
{
    UASSERT( theDelimiter.size(), >, 0); // My own ASSERT macro.

    size_t  start = 0, end = 0;

    while ( end != string::npos)
    {
        end = theString.find( theDelimiter, start);

        // If at end, use length=maxLength.  Else use length=end-start.
        theStringVector.push_back( theString.substr( start,
                       (end == string::npos) ? string::npos : end - start));

        // If at end, use start=maxSize.  Else use start=end+delimiter.
        start = (   ( end > (string::npos - theDelimiter.size()) )
                  ?  string::npos  :  end + theDelimiter.size());
    }
}

예를 들어 (Doug의 경우)

#define SHOW(I,X)   cout << "[" << (I) << "]\t " # X " = \"" << (X) << "\"" << endl

int
main()
{
    vector<string> v;

    split( v, "A:PEP:909:Inventory Item", ":" );

    for (unsigned int i = 0;  i < v.size();   i++)
        SHOW( i, v[i] );
}

그리고 그렇습니다. split ()이 새로운 벡터를 전달하지 않고 새로운 벡터를 반환하도록 할 수 있습니다. 랩핑과 과부하는 쉽지 않습니다. 그러나 내가하고있는 일에 따라 항상 새로운 객체를 만드는 것보다 기존 객체를 재사용하는 것이 좋습니다. (사이에 벡터를 비우는 것을 잊지 않는 한!)

참조 : http://www.cplusplus.com/reference/string/string/ .

(원래 Doug의 질문에 대한 응답을 작성했습니다 .C ++ Strings Modifying and Extracting based on Separators (closed) . 그러나 Martin York는 포인터를 사용하여 해당 질문을 마쳤으므로 코드를 일반화합니다.)

regex_token_iterators를 사용하는 솔루션 :

#include <iostream>
#include <regex>
#include <string>

using namespace std;

int main()
{
    string str("The quick brown fox");

    regex reg("\\s+");

    sregex_token_iterator iter(str.begin(), str.end(), reg, -1);
    sregex_token_iterator end;

    vector<string> vec(iter, end);

    for (auto a : vec)
    {
        cout << a << endl;
    }
}

Boost 는 강력한 split 기능을 가지고 있습니다 : boost :: algorithm :: split .

샘플 프로그램 :

#include <vector>
#include <boost/algorithm/string.hpp>

int main() {
    auto s = "a,b, c ,,e,f,";
    std::vector<std::string> fields;
    boost::split(fields, s, boost::is_any_of(","));
    for (const auto& field : fields)
        std::cout << "\"" << field << "\"\n";
    return 0;
}

산출:

"a"
"b"
" c "
""
"e"
"f"
""

C ++ 솔루션을 요청했지만 이것이 도움이 될 수 있습니다.

#include <QString>

...

QString str = "The quick brown fox"; 
QStringList results = str.split(" ");

이 예제에서 Boost에 비해 장점은 게시물 코드에 일대일로 직접 매핑된다는 것입니다.

Qt 문서 에서 더보기

다음은 원하는 것을 수행 할 수있는 샘플 토크 나이저 클래스입니다.

//Header file
class Tokenizer 
{
    public:
        static const std::string DELIMITERS;
        Tokenizer(const std::string& str);
        Tokenizer(const std::string& str, const std::string& delimiters);
        bool NextToken();
        bool NextToken(const std::string& delimiters);
        const std::string GetToken() const;
        void Reset();
    protected:
        size_t m_offset;
        const std::string m_string;
        std::string m_token;
        std::string m_delimiters;
};

//CPP file
const std::string Tokenizer::DELIMITERS(" \t\n\r");

Tokenizer::Tokenizer(const std::string& s) :
    m_string(s), 
    m_offset(0), 
    m_delimiters(DELIMITERS) {}

Tokenizer::Tokenizer(const std::string& s, const std::string& delimiters) :
    m_string(s), 
    m_offset(0), 
    m_delimiters(delimiters) {}

bool Tokenizer::NextToken() 
{
    return NextToken(m_delimiters);
}

bool Tokenizer::NextToken(const std::string& delimiters) 
{
    size_t i = m_string.find_first_not_of(delimiters, m_offset);
    if (std::string::npos == i) 
    {
        m_offset = m_string.length();
        return false;
    }

    size_t j = m_string.find_first_of(delimiters, i);
    if (std::string::npos == j) 
    {
        m_token = m_string.substr(i);
        m_offset = m_string.length();
        return true;
    }

    m_token = m_string.substr(i, j - i);
    m_offset = j;
    return true;
}

예:

std::vector <std::string> v;
Tokenizer s("split this string", " ");
while (s.NextToken())
{
    v.push_back(s.GetToken());
}

이것은 간단한 STL 전용 솔루션 (~ 5 줄!) std::find이며 std::find_first_not_of구분 기호 (예 : 공백 또는 마침표)뿐만 아니라 선행 및 후행 구분 기호를 사용하여 반복자를 처리합니다.

#include <string>
#include <vector>

void tokenize(std::string str, std::vector<string> &token_v){
    size_t start = str.find_first_not_of(DELIMITER), end=start;

    while (start != std::string::npos){
        // Find next occurence of delimiter
        end = str.find(DELIMITER, start);
        // Push back the token found into vector
        token_v.push_back(str.substr(start, end-start));
        // Skip all occurences of the delimiter to find new start
        start = str.find_first_not_of(DELIMITER, end);
    }
}

라이브로 사용해보십시오 !

pystring 은 split 메소드를 포함하여 많은 파이썬 문자열 함수를 구현하는 작은 라이브러리입니다.

#include <string>
#include <vector>
#include "pystring.h"

std::vector<std::string> chunks;
pystring::split("this string", chunks);

// also can specify a separator
pystring::split("this-string", chunks, "-");

비슷한 질문 에이 답변을 게시했습니다.
바퀴를 재발 명하지 마십시오. 나는 많은 라이브러리를 사용했으며 가장 빠르고 융통성있는 것은 C ++ String Toolkit Library 입니다.

다음은 스택 오버 플로우의 다른 곳에 게시 한 사용 방법의 예입니다.

#include <iostream>
#include <vector>
#include <string>
#include <strtk.hpp>

const char *whitespace  = " \t\r\n\f";
const char *whitespace_and_punctuation  = " \t\r\n\f;,=";

int main()
{
    {   // normal parsing of a string into a vector of strings
       std::string s("Somewhere down the road");
       std::vector<std::string> result;
       if( strtk::parse( s, whitespace, result ) )
       {
           for(size_t i = 0; i < result.size(); ++i )
            std::cout << result[i] << std::endl;
       }
    }

    {  // parsing a string into a vector of floats with other separators
       // besides spaces

       std::string s("3.0, 3.14; 4.0");
       std::vector<float> values;
       if( strtk::parse( s, whitespace_and_punctuation, values ) )
       {
           for(size_t i = 0; i < values.size(); ++i )
            std::cout << values[i] << std::endl;
       }
    }

    {  // parsing a string into specific variables

       std::string s("angle = 45; radius = 9.9");
       std::string w1, w2;
       float v1, v2;
       if( strtk::parse( s, whitespace_and_punctuation, w1, v1, w2, v2) )
       {
           std::cout << "word " << w1 << ", value " << v1 << std::endl;
           std::cout << "word " << w2 << ", value " << v2 << std::endl;
       }
    }

    return 0;
}

이 예를 확인하십시오. 도움이 될 것입니다 ..

#include <iostream>
#include <sstream>

using namespace std;

int main ()
{
    string tmps;
    istringstream is ("the dellimiter is the space");
    while (is.good ()) {
        is >> tmps;
        cout << tmps << "\n";
    }
    return 0;
}

MFC / ATL에는 매우 좋은 토크 나이저가 있습니다. MSDN에서 :

CAtlString str( "%First Second#Third" );
CAtlString resToken;
int curPos= 0;

resToken= str.Tokenize("% #",curPos);
while (resToken != "")
{
   printf("Resulting token: %s\n", resToken);
   resToken= str.Tokenize("% #",curPos);
};

Output

Resulting Token: First
Resulting Token: Second
Resulting Token: Third

C를 기꺼이 사용하려면 strtok 함수를 사용할 수 있습니다 . 멀티 스레딩 문제를 사용할 때는주의해야합니다.

간단한 것들을 위해 나는 다음을 사용합니다.

unsigned TokenizeString(const std::string& i_source,
                        const std::string& i_seperators,
                        bool i_discard_empty_tokens,
                        std::vector<std::string>& o_tokens)
{
    unsigned prev_pos = 0;
    unsigned pos = 0;
    unsigned number_of_tokens = 0;
    o_tokens.clear();
    pos = i_source.find_first_of(i_seperators, pos);
    while (pos != std::string::npos)
    {
        std::string token = i_source.substr(prev_pos, pos - prev_pos);
        if (!i_discard_empty_tokens || token != "")
        {
            o_tokens.push_back(i_source.substr(prev_pos, pos - prev_pos));
            number_of_tokens++;
        }

        pos++;
        prev_pos = pos;
        pos = i_source.find_first_of(i_seperators, pos);
    }

    if (prev_pos < i_source.length())
    {
        o_tokens.push_back(i_source.substr(prev_pos));
        number_of_tokens++;
    }

    return number_of_tokens;
}

비겁한 면책 조항 : 나는 데이터가 바이너리 파일, 소켓 또는 API 호출 (I / O 카드, 카메라)을 통해 들어오는 실시간 데이터 처리 소프트웨어를 작성합니다. 시작시 외부 구성 파일을 읽는 것보다 더 복잡하거나 시간이 중요한 작업에는이 기능을 사용하지 않습니다.

간단히 정규식 라이브러리 를 사용하고 정규식 을 사용하여 해결할 수 있습니다 .

표현식 (\ w +)과 \ 1의 변수를 사용하십시오 (또는 정규 표현식의 라이브러리 구현에 따라 $ 1).

여기에 지나치게 복잡한 제안이 많이 있습니다. 이 간단한 std :: string 솔루션을 사용해보십시오.

using namespace std;

string someText = ...

string::size_type tokenOff = 0, sepOff = tokenOff;
while (sepOff != string::npos)
{
    sepOff = someText.find(' ', sepOff);
    string::size_type tokenLen = (sepOff == string::npos) ? sepOff : sepOff++ - tokenOff;
    string token = someText.substr(tokenOff, tokenLen);
    if (!token.empty())
        /* do something with token */;
    tokenOff = sepOff;
}

나는 그것이 >>문자열 스트림 의 연산자 인 것이라고 생각했습니다 .

string word; sin >> word;

Adam Pierce의 답변 은을 (를)받는 수동 회전 토크 나이저를 제공합니다 const char*. 종료 이터레이터를 증가시키는 string것은 정의되어 있지 않기 때문에 이터레이터와 관련하여 조금 더 문제가 있습니다. string str{ "The quick brown fox" }우리가 확실히 이것을 달성 할 수 있다면 , 그것은 말했다 :

auto start = find(cbegin(str), cend(str), ' ');
vector<string> tokens{ string(cbegin(str), start) };

while (start != cend(str)) {
    const auto finish = find(++start, cend(str), ' ');

    tokens.push_back(string(start, finish));
    start = finish;
}

Live Example

On Freund가 제안한 것처럼 표준 기능을 사용하여 복잡성을 추상화하려는 경우 다음과 같은 strtok간단한 옵션이 있습니다.

vector<string> tokens;

for (auto i = strtok(data(str), " "); i != nullptr; i = strtok(nullptr, " ")) tokens.push_back(i);

C ++ 17에 액세스 할 수없는 경우 다음 data(str)예와 같이 대체해야합니다 . http://ideone.com/8kAGoa

예제에서는 설명하지 않았지만 strtok각 토큰에 동일한 구분 기호를 사용할 필요는 없습니다. 그러나이 장점과 함께 몇 가지 단점이 있습니다.

strtok여러 사용할 수 없습니다 strings동시에 : 중 하나는 nullptr현재의 토큰 화를 계속 전달해야 string하거나 새를 char*토큰 화를 전달해야합니다에 (예, 그러나이 기능을 지원 할 일부 비 표준 구현이 있습니다 strtok_s)
같은 이유로 strtok여러 스레드에서 동시에 사용할 수 없습니다 (그러나 구현 정의가 가능할 수 있습니다 (예 : Visual Studio의 구현은 스레드 안전 )).
호출 은 작동중인 파일을 strtok수정 string하므로 const strings, const char*s 또는 리터럴 문자열에서이를 사용하여 토큰을 작성 strtok하거나 string내용을 보존해야하는 사람 에 대해 작동하거나 str복사해야하는 경우 복사 할 수 없습니다. ~에 작동하다

이전의 두 방법 모두 토큰 화 vector된 인플레 이스 를 생성 할 수 없습니다 . 즉, 초기화 할 수없는 도우미 함수로 추상화하지 않으면 의미가 있습니다 const vector<string> tokens. 공백 구분 기호 를 허용 하는 기능 과 기능은을 사용하여 활용할 수 있습니다 . 예를 들어 다음과 같이 할 수 있습니다.istream_iteratorconst string str{ "The quick \tbrown \nfox" }

istringstream is{ str };
const vector<string> tokens{ istream_iterator<string>(is), istream_iterator<string>() };

Live Example

istringstream이 옵션 의 필수 구성은 이전 2 옵션보다 비용이 훨씬 높지만이 비용은 일반적으로 string할당 비용으로 숨겨져 있습니다.

위의 옵션 중 어느 것도 토큰 화 요구에 충분히 융통성이 없다면, 가장 융통성있는 옵션은 regex_token_iterator물론 이러한 융통성으로 비용이 많이 들지만 string할당 비용에 숨겨져있을 가능성이 높습니다 . 예를 들어 다음과 같은 입력이 주어지면 이스케이프되지 않은 쉼표를 기반으로 토큰 화하고 공백을 사용한다고 가정 해보십시오 const string str{ "The ,qu\\,ick ,\tbrown, fox" }.

const regex re{ "\\s*((?:[^\\\\,]|\\\\.)*?)\\s*(?:,|$)" };
const vector<string> tokens{ sregex_token_iterator(cbegin(str), cend(str), re, 1), sregex_token_iterator() };

Live Example

빈 토큰이 포함되는지 (strsep와 같은) 또는 제외 (strtok과 같은)를 제어 할 수있는 방법이 있습니다.

#include <string.h> // for strchr and strlen

/*
 * want_empty_tokens==true  : include empty tokens, like strsep()
 * want_empty_tokens==false : exclude empty tokens, like strtok()
 */
std::vector<std::string> tokenize(const char* src,
                                  char delim,
                                  bool want_empty_tokens)
{
  std::vector<std::string> tokens;

  if (src and *src != '\0') // defensive
    while( true )  {
      const char* d = strchr(src, delim);
      size_t len = (d)? d-src : strlen(src);

      if (len or want_empty_tokens)
        tokens.push_back( std::string(src, len) ); // capture token

      if (d) src += len+1; else break;
    }

  return tokens;
}

우리 모두에게 속도 의식이 필요한 머저리가 있기 때문에 아무도 구분 기호에 대한 컴파일 시간 생성 조회 테이블을 사용하는 버전을 제시하지 않은 것 같습니다 (예제 구현). 조회 테이블과 반복자를 사용하면 효율성이 std :: regex를 능가해야합니다. 정규식을 이길 필요가 없다면 C ++ 11 기준으로 유연하고 유연합니다.

일부는 이미 정규 표현식을 제안했지만 멍청한 놈을 위해 OP가 기대하는 것을 정확하게 수행 해야하는 패키지 된 예제가 있습니다.

std::vector<std::string> split(std::string::const_iterator it, std::string::const_iterator end, std::regex e = std::regex{"\\w+"}){
    std::smatch m{};
    std::vector<std::string> ret{};
    while (std::regex_search (it,end,m,e)) {
        ret.emplace_back(m.str());              
        std::advance(it, m.position() + m.length()); //next start position = match position + match length
    }
    return ret;
}
std::vector<std::string> split(const std::string &s, std::regex e = std::regex{"\\w+"}){  //comfort version calls flexible version
    return split(s.cbegin(), s.cend(), std::move(e));
}
int main ()
{
    std::string str {"Some people, excluding those present, have been compile time constants - since puberty."};
    auto v = split(str);
    for(const auto&s:v){
        std::cout << s << std::endl;
    }
    std::cout << "crazy version:" << std::endl;
    v = split(str, std::regex{"[^e]+"});  //using e as delim shows flexibility
    for(const auto&s:v){
        std::cout << s << std::endl;
    }
    return 0;
}

더 빨라야하고 모든 문자가 8 비트 여야한다는 제약 조건을 받아들이는 경우 메타 프로그래밍을 사용하여 컴파일 타임에 조회 테이블을 만들 수 있습니다.

template<bool...> struct BoolSequence{};        //just here to hold bools
template<char...> struct CharSequence{};        //just here to hold chars
template<typename T, char C> struct Contains;   //generic
template<char First, char... Cs, char Match>    //not first specialization
struct Contains<CharSequence<First, Cs...>,Match> :
    Contains<CharSequence<Cs...>, Match>{};     //strip first and increase index
template<char First, char... Cs>                //is first specialization
struct Contains<CharSequence<First, Cs...>,First>: std::true_type {}; 
template<char Match>                            //not found specialization
struct Contains<CharSequence<>,Match>: std::false_type{};

template<int I, typename T, typename U> 
struct MakeSequence;                            //generic
template<int I, bool... Bs, typename U> 
struct MakeSequence<I,BoolSequence<Bs...>, U>:  //not last
    MakeSequence<I-1, BoolSequence<Contains<U,I-1>::value,Bs...>, U>{};
template<bool... Bs, typename U> 
struct MakeSequence<0,BoolSequence<Bs...>,U>{   //last  
    using Type = BoolSequence<Bs...>;
};
template<typename T> struct BoolASCIITable;
template<bool... Bs> struct BoolASCIITable<BoolSequence<Bs...>>{
    /* could be made constexpr but not yet supported by MSVC */
    static bool isDelim(const char c){
        static const bool table[256] = {Bs...};
        return table[static_cast<int>(c)];
    }   
};
using Delims = CharSequence<'.',',',' ',':','\n'>;  //list your custom delimiters here
using Table = BoolASCIITable<typename MakeSequence<256,BoolSequence<>,Delims>::Type>;

그 자리에 getNextToken기능을 쉽게 만들 수 있습니다.

template<typename T_It>
std::pair<T_It,T_It> getNextToken(T_It begin,T_It end){
    begin = std::find_if(begin,end,std::not1(Table{})); //find first non delim or end
    auto second = std::find_if(begin,end,Table{});      //find first delim or end
    return std::make_pair(begin,second);
}

그것을 사용하는 것도 쉽습니다 :

int main() {
    std::string s{"Some people, excluding those present, have been compile time constants - since puberty."};
    auto it = std::begin(s);
    auto end = std::end(s);
    while(it != std::end(s)){
        auto token = getNextToken(it,end);
        std::cout << std::string(token.first,token.second) << std::endl;
        it = token.second;
    }
    return 0;
}

실제 예는 다음과 같습니다. http://ideone.com/GKtkLQ

이 질문에 이미 답변했지만 기여하고 싶습니다. 어쩌면 내 솔루션은 약간 간단하지만 이것이 내가 생각해 낸 것입니다.

vector<string> get_words(string const& text)
{
    vector<string> result;
    string tmp = text;

    size_t first_pos = 0;
    size_t second_pos = tmp.find(" ");;

    while (second_pos != string::npos)
    {
        if (first_pos != second_pos)
        {
            string word = tmp.substr(first_pos, second_pos - first_pos);
            result.push_back(word);
        }
        tmp = tmp.substr(second_pos + 1);
        second_pos = tmp.find(" ");
    }

    result.push_back(tmp);

    return result;
}

내 코드에 더 나은 접근 방식이 있거나 잘못된 것이 있으면 의견을 말하십시오.

이를 수행하는 직접적인 방법은 없습니다. 이 코드 프로젝트 소스 코드 를 참조 하여 클래스를 작성하는 방법을 찾으십시오.

boost :: make_find_iterator를 활용할 수 있습니다. 이것과 비슷한 것 :

template<typename CH>
inline vector< basic_string<CH> > tokenize(
    const basic_string<CH> &Input,
    const basic_string<CH> &Delimiter,
    bool remove_empty_token
    ) {

    typedef typename basic_string<CH>::const_iterator string_iterator_t;
    typedef boost::find_iterator< string_iterator_t > string_find_iterator_t;

    vector< basic_string<CH> > Result;
    string_iterator_t it = Input.begin();
    string_iterator_t it_end = Input.end();
    for(string_find_iterator_t i = boost::make_find_iterator(Input, boost::first_finder(Delimiter, boost::is_equal()));
        i != string_find_iterator_t();
        ++i) {
        if(remove_empty_token){
            if(it != i->begin())
                Result.push_back(basic_string<CH>(it,i->begin()));
        }
        else
            Result.push_back(basic_string<CH>(it,i->begin()));
        it = i->end();
    }
    if(it != it_end)
        Result.push_back(basic_string<CH>(it,it_end));

    return Result;
}

토큰화할 입력 문자열의 최대 길이를 알고 있으면이를 활용하여 매우 빠른 버전을 구현할 수 있습니다. 아래의 기본 아이디어를 스케치하고 있습니다. 이것은 Jon Bentley의 "Programming Perls"2 판 15 장에 설명 된 strtok () 및 "접미사 배열"-데이터 구조에서 영감을 얻었습니다.이 경우 C ++ 클래스는 일부 조직과 편의를 제공합니다. 사용합니다. 표시된 구현은 토큰에서 선행 및 후행 공백 문자를 제거하기 위해 쉽게 확장 할 수 있습니다.

기본적으로 구분 문자를 문자열 종료 '\ 0'문자로 바꾸고 수정 된 문자열을 사용하여 토큰에 대한 포인터를 설정할 수 있습니다. 문자열이 분리 자로 만 구성된 극단적 인 경우 하나는 문자열 길이에 1 개의 빈 토큰을 더합니다. 수정할 문자열을 복제하는 것이 실용적입니다.

헤더 파일 :

class TextLineSplitter
{
public:

    TextLineSplitter( const size_t max_line_len );

    ~TextLineSplitter();

    void            SplitLine( const char *line,
                               const char sep_char = ',',
                             );

    inline size_t   NumTokens( void ) const
    {
        return mNumTokens;
    }

    const char *    GetToken( const size_t token_idx ) const
    {
        assert( token_idx < mNumTokens );
        return mTokens[ token_idx ];
    }

private:
    const size_t    mStorageSize;

    char           *mBuff;
    char          **mTokens;
    size_t          mNumTokens;

    inline void     ResetContent( void )
    {
        memset( mBuff, 0, mStorageSize );
        // mark all items as empty:
        memset( mTokens, 0, mStorageSize * sizeof( char* ) );
        // reset counter for found items:
        mNumTokens = 0L;
    }
};

구현 파일 :

TextLineSplitter::TextLineSplitter( const size_t max_line_len ):
    mStorageSize ( max_line_len + 1L )
{
    // allocate memory
    mBuff   = new char  [ mStorageSize ];
    mTokens = new char* [ mStorageSize ];

    ResetContent();
}

TextLineSplitter::~TextLineSplitter()
{
    delete [] mBuff;
    delete [] mTokens;
}


void TextLineSplitter::SplitLine( const char *line,
                                  const char sep_char   /* = ',' */,
                                )
{
    assert( sep_char != '\0' );

    ResetContent();
    strncpy( mBuff, line, mMaxLineLen );

    size_t idx       = 0L; // running index for characters

    do
    {
        assert( idx < mStorageSize );

        const char chr = line[ idx ]; // retrieve current character

        if( mTokens[ mNumTokens ] == NULL )
        {
            mTokens[ mNumTokens ] = &mBuff[ idx ];
        } // if

        if( chr == sep_char || chr == '\0' )
        { // item or line finished
            // overwrite separator with a 0-terminating character:
            mBuff[ idx ] = '\0';
            // count-up items:
            mNumTokens ++;
        } // if

    } while( line[ idx++ ] );
}

사용 시나리오는 다음과 같습니다.

// create an instance capable of splitting strings up to 1000 chars long:
TextLineSplitter spl( 1000 );
spl.SplitLine( "Item1,,Item2,Item3" );
for( size_t i = 0; i < spl.NumTokens(); i++ )
{
    printf( "%s\n", spl.GetToken( i ) );
}

산출:

Item1

Item2
Item3

boost::tokenizer는 당신의 친구이지만 , 레거시 / 유형 대신 wstring/ wchar_t를 사용하여 국제화 (i18n) 문제를 참조하여 코드를 이식 가능하게 만드는 것을 고려하십시오 .stringchar

#include <iostream>
#include <boost/tokenizer.hpp>
#include <string>

using namespace std;
using namespace boost;

typedef tokenizer<char_separator<wchar_t>,
                  wstring::const_iterator, wstring> Tok;

int main()
{
  wstring s;
  while (getline(wcin, s)) {
    char_separator<wchar_t> sep(L" "); // list of separator characters
    Tok tok(s, sep);
    for (Tok::iterator beg = tok.begin(); beg != tok.end(); ++beg) {
      wcout << *beg << L"\t"; // output (or store in vector)
    }
    wcout << L"\n";
  }
  return 0;
}

간단한 C ++ 코드 (표준 C ++ 98)는 여러 구분 기호 (std :: string에 지정)를 허용하며 벡터, 문자열 및 반복자 만 사용합니다.

#include <iostream>
#include <vector>
#include <string>
#include <stdexcept> 

std::vector<std::string> 
split(const std::string& str, const std::string& delim){
    std::vector<std::string> result;
    if (str.empty())
        throw std::runtime_error("Can not tokenize an empty string!");
    std::string::const_iterator begin, str_it;
    begin = str_it = str.begin(); 
    do {
        while (delim.find(*str_it) == std::string::npos && str_it != str.end())
            str_it++; // find the position of the first delimiter in str
        std::string token = std::string(begin, str_it); // grab the token
        if (!token.empty()) // empty token only when str starts with a delimiter
            result.push_back(token); // push the token into a vector<string>
        while (delim.find(*str_it) != std::string::npos && str_it != str.end())
            str_it++; // ignore the additional consecutive delimiters
        begin = str_it; // process the remaining tokens
        } while (str_it != str.end());
    return result;
}

int main() {
    std::string test_string = ".this is.a.../.simple;;test;;;END";
    std::string delim = "; ./"; // string containing the delimiters
    std::vector<std::string> tokens = split(test_string, delim);           
    for (std::vector<std::string>::const_iterator it = tokens.begin(); 
        it != tokens.end(); it++)
            std::cout << *it << std::endl;
}

참고 URL : https://stackoverflow.com/questions/53849/how-do-i-tokenize-a-string-in-c

'Programing' 카테고리의 다른 글

jQuery 및 Ajax와 함께 기본 인증 사용 (0)	2020.02.22
주어진 문자열의 모든 순열 생성 (0)	2020.02.22
HttpClient의 인증 헤더 설정 (0)	2020.02.22
인쇄 (0)	2020.02.22
파이썬에서 디렉토리의 zip 아카이브를 만드는 방법은 무엇입니까? (0)	2020.02.22

현재글C ++에서 문자열을 어떻게 토큰 화합니까?

복권의 역사, 로또 정보와 IT 기술 등을 다루는 블로그입니다.

연극, 자바, 관광, 행사, JQuery, java, 극장순위, 놀거리, 축제, c#, 여행, 가족나들이, 볼거리, spring, 무비순위, 공연, 뮤지컬, Spring3, Javascript, c++,

Today :
Yesterday :

lottogame

C ++에서 문자열을 어떻게 토큰 화합니까?

C ++에서 문자열을 어떻게 토큰 화합니까?

'Programing' 카테고리의 다른 글

'Programing'의 다른글

티스토리툴바

« 2025/02 »
일	월	화	수	목	금	토
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28

C ++에서 문자열을 어떻게 토큰 화합니까?

C ++에서 문자열을 어떻게 토큰 화합니까?

'Programing' 카테고리의 다른 글

'Programing'의 다른글

관련글

티스토리툴바