SimpCfg: Fixed & ~Variable Length to Native & MultiNativeCharSize
So as to make the notes, more generic.
This commit is contained in:
parent
d030a26f3c
commit
5b8bf849c0
1 changed files with 28 additions and 24 deletions
|
@ -97,30 +97,32 @@ void dumphex_string(const TString &sIn, const std::string &msgTag){
|
||||||
// belongs to one of the chars in trimChars.
|
// belongs to one of the chars in trimChars.
|
||||||
//
|
//
|
||||||
// NOTE: This will work perfectly provided the string being trimmed as well as
|
// NOTE: This will work perfectly provided the string being trimmed as well as
|
||||||
// chars being trimmed are made up of FixedSize chars from the same encoded space.
|
// chars being trimmed are made up of NativeCharSize chars from same encoded space.
|
||||||
// For utf-8, this means the ascii equivalent 1byteSized chars of utf8 and not
|
// For utf-8, this means the ascii equivalent 1byteSized chars of utf8 and not
|
||||||
// variable length ones.
|
// variable length MultiNativeCharSize (ie multibye in case of utf-8) ones.
|
||||||
// NOTE: It will also work, if atleast either end of string have fixedSize chars
|
// NOTE: It will also work, if atleast either end of string as well as trimChars
|
||||||
// from their encoding space, rather than variable length based chars if any.
|
// have NativeCharSize chars from their encoding space, rather than variable
|
||||||
// And the trimChars are also fixedSize encoded chars.
|
// length MultiNativeCharSize based chars if any.
|
||||||
//
|
//
|
||||||
// NOTE: Given the way UTF-8 char encoding is designed, where fixedSize 1byte
|
// NOTE: Given the way UTF-8 char encoding is designed, where NativeCharSize 1byte
|
||||||
// encoded chars are fully unique and dont overlap with any bytes from any of
|
// encoded chars are fully unique and dont overlap with any bytes from any of the
|
||||||
// the variable length encoded chars in the utf-8 space, so as long as the
|
// variable length MultiNativeCharSize encoded chars in the utf-8 space, so as long as
|
||||||
// trimChars belong to the fixedSize chars subset, the logic should work, even
|
// the trimChars belong to NativeCharSize chars subset, the logic should work, even
|
||||||
// if the string has a mixture of fixed and variable length encoded chars.
|
// if string has a mixture of NativeCharSize and MultiNativeCharSize encoded chars.
|
||||||
// Chances are utf-16 and utf-32 also have similar characteristics wrt thier
|
// Chances are utf-16 and utf-32 also have similar characteristics wrt thier
|
||||||
// fixedSize encoded chars, and so equivalent semantic applies to them also.
|
// NativeCharSize encoded chars (ie fully encoded within single 16bit and 32bit value
|
||||||
|
// respectively), and so equivalent semantic applies to them also.
|
||||||
//
|
//
|
||||||
// ALERT: Given that this simple minded logic, works at individual bytes level
|
// ALERT: Given that this simple minded logic, works at individual NativeCharSize level
|
||||||
// only, If trimChars involve variable length encoded chars, then
|
// only, If trimChars involve variable length MultiNativeCharSize encoded chars, then
|
||||||
// * because different bytes from different trim chars when clubbed together
|
// * because different NativeCharSize subparts (bytes in case of utf-8) from different
|
||||||
// can map to some other new char, if there is that new char at either end
|
// MultiNativeCharSize trim chars when clubbed together can map to some other new char
|
||||||
// of the string, it may get trimmed, because of the possibility of mix up
|
// in a variable length encoded char space, if there is that new char at either end
|
||||||
// mentioned.
|
// of the string, it may get trimmed, because of the possibility of mix up mentioned.
|
||||||
// * given that different variable length encoded chars may have some common
|
// * given that different variable length MultiNativeCharSize encoded chars may have
|
||||||
// bytes between them, if one of these chars is at either end of the string
|
// some common NativeCharSize subparts (bytes in case of utf-8) between them, if one
|
||||||
// and another char is in trimChars, then string may get partially trimmed.
|
// of these chars is at either end of the string and another char is in trimChars,
|
||||||
|
// then string may get partially trimmed.
|
||||||
//
|
//
|
||||||
template <typename TString>
|
template <typename TString>
|
||||||
TString str_trim_dumb(TString sin, const TString &trimChars=" \t\n") {
|
TString str_trim_dumb(TString sin, const TString &trimChars=" \t\n") {
|
||||||
|
@ -137,8 +139,10 @@ TString str_trim_dumb(TString sin, const TString &trimChars=" \t\n") {
|
||||||
// to one of the chars in trimChars.
|
// to one of the chars in trimChars.
|
||||||
// NOTE: Internally converts to wchar/wstring to try and support proper trimming,
|
// NOTE: Internally converts to wchar/wstring to try and support proper trimming,
|
||||||
// wrt possibly more languages, to some extent. IE even if the passed string
|
// wrt possibly more languages, to some extent. IE even if the passed string
|
||||||
// contains multibyte encoded characters in it in utf-8 space, it may get converted
|
// contains multibyte encoded characters in it in utf-8 space (ie MultiNativeCharSize),
|
||||||
// to fixed size chars in the expanded wchar_t encoding space.
|
// it may get converted to NativeCharSize chars in the expanded wchar_t encoding space,
|
||||||
|
// thus leading to fixed NativeCharSize driven logic itself handling things sufficiently.
|
||||||
|
// Look at str_trim_dumb comments for additional aspects.
|
||||||
std::string str_trim_oversmart(std::string sIn, const std::string &trimChars=" \t\n") {
|
std::string str_trim_oversmart(std::string sIn, const std::string &trimChars=" \t\n") {
|
||||||
std::wstring wIn;
|
std::wstring wIn;
|
||||||
mbs_to_wcs(wIn, sIn);
|
mbs_to_wcs(wIn, sIn);
|
||||||
|
@ -152,8 +156,8 @@ std::string str_trim_oversmart(std::string sIn, const std::string &trimChars=" \
|
||||||
|
|
||||||
// Remove atmost 1 char at the begin and 1 char at the end of the passed string,
|
// Remove atmost 1 char at the begin and 1 char at the end of the passed string,
|
||||||
// provided the char belongs to one of the chars in trimChars.
|
// provided the char belongs to one of the chars in trimChars.
|
||||||
// NOTE: Chars being trimmed (ie in trimChars) needs to be 1byte encoded chars, to
|
// NOTE: Chars being trimmed (ie in trimChars) needs to be FixedSize encoded chars,
|
||||||
// avoid mix up when working utf-8/variable length encoded strings.
|
// to avoid mix up when working with strings which can utf-8/variable length encoded strings.
|
||||||
// NOTE: This will work provided the string being trimmed as well the chars being
|
// NOTE: This will work provided the string being trimmed as well the chars being
|
||||||
// trimmed are made up of 1byte encoded chars including in utf8 encoding space.
|
// trimmed are made up of 1byte encoded chars including in utf8 encoding space.
|
||||||
// If the string being trimmed includes multibyte encoded characters at the end,
|
// If the string being trimmed includes multibyte encoded characters at the end,
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue