Check BsonObjectID length from a String

Permalink: HTTP NNTP

Chuck

Posted Sat, 11 Jul 2015 20:58:45 GMT

Is there a good way to check a string for the proper length when I convert it to an Bson objectID. The object ID length must be 24 characters, which works fine when using ASCII characters, but breaks if someone adds an accented letter (e.g. é), as this takes up two characters. I know that I can use regular expressions to halt the program if an accented letter (or nonASCII-style letter) is found, but is there a more proper way to do this?

When I do an ID search for BsonObjectID.fromString("55981e8fbea1ebc0dba665a8"), it works fine.
When I do an ID search that has an accented letter, 55981é8fbea1ebc0dba665a8, it breaks.

Re: Check BsonObjectID length from a String

Permalink: HTTP NNTP

Taylor Gronka

Posted Sat, 18 Jul 2015 00:26:47 GMT in reply to Chuck

On Sat, 11 Jul 2015 20:58:45 GMT, Chuck wrote:

Is there a good way to check a string for the proper length when I convert it to an Bson objectID. The object ID length must be 24 characters, which works fine when using ASCII characters, but breaks if someone adds an accented letter (e.g. é), as this takes up two characters. I know that I can use regular expressions to halt the program if an accented letter (or nonASCII-style letter) is found, but is there a more proper way to do this?

When I do an ID search for BsonObjectID.fromString("55981e8fbea1ebc0dba665a8"), it works fine.
When I do an ID search that has an accented letter, 55981é8fbea1ebc0dba665a8, it breaks.

DLang has 3 types for characters and therefore strings, if I'm not mistaken. dchar, wchar, dstring, and wstring are probably the types you need to use.

I haven't used them myself yet, so I'm not sure which one you'll want to use, but this example uses dstring for the character you're specifying:
http://ddili.org/ders/d.en/strings.html

Re: Check BsonObjectID length from a String

Permalink: HTTP NNTP

Chuck

Posted Sat, 18 Jul 2015 04:45:21 GMT in reply to Taylor Gronka

On Sat, 18 Jul 2015 00:26:47 GMT, Taylor Gronka wrote:

On Sat, 11 Jul 2015 20:58:45 GMT, Chuck wrote:

Is there a good way to check a string for the proper length when I convert it to an Bson objectID. The object ID length must be 24 characters, which works fine when using ASCII characters, but breaks if someone adds an accented letter (e.g. é), as this takes up two characters. I know that I can use regular expressions to halt the program if an accented letter (or nonASCII-style letter) is found, but is there a more proper way to do this?

When I do an ID search for BsonObjectID.fromString("55981e8fbea1ebc0dba665a8"), it works fine.
When I do an ID search that has an accented letter, 55981é8fbea1ebc0dba665a8, it breaks.

DLang has 3 types for characters and therefore strings, if I'm not mistaken. dchar, wchar, dstring, and wstring are probably the types you need to use.

I haven't used them myself yet, so I'm not sure which one you'll want to use, but this example uses dstring for the character you're specifying:
http://ddili.org/ders/d.en/strings.html

Thank you very much. That's what I was looking for.

Re: Check BsonObjectID length from a String

Permalink: HTTP NNTP

Sönke Ludwig

Posted Sun, 19 Jul 2015 06:49:21 GMT in reply to Chuck

On Sat, 18 Jul 2015 04:45:21 GMT, Chuck wrote:

On Sat, 18 Jul 2015 00:26:47 GMT, Taylor Gronka wrote:

On Sat, 11 Jul 2015 20:58:45 GMT, Chuck wrote:

Is there a good way to check a string for the proper length when I convert it to an Bson objectID. The object ID length must be 24 characters, which works fine when using ASCII characters, but breaks if someone adds an accented letter (e.g. é), as this takes up two characters. I know that I can use regular expressions to halt the program if an accented letter (or nonASCII-style letter) is found, but is there a more proper way to do this?

When I do an ID search for BsonObjectID.fromString("55981e8fbea1ebc0dba665a8"), it works fine.
When I do an ID search that has an accented letter, 55981é8fbea1ebc0dba665a8, it breaks.

DLang has 3 types for characters and therefore strings, if I'm not mistaken. dchar, wchar, dstring, and wstring are probably the types you need to use.

I haven't used them myself yet, so I'm not sure which one you'll want to use, but this example uses dstring for the character you're specifying:
http://ddili.org/ders/d.en/strings.html

Thank you very much. That's what I was looking for.

Usually you should get an exception within BsonObjectID.fromString that you can catch to handle malformed ID strings. Does that happen in your case, or do you get some other kind of breakage?

Checking the length can happen on three levels. In this case, since only ASCII letters are allowed, they will all yield the same value for correct ID strings:

string.length: Returns the number of bytes the string makes up. This will be greater (for non-ASCII) or equal (for ASCII) than the numbers below.
walkLength(string) (in std.algorithm): Returns the number of dchars (UTF-32 code points). This is more often equivalent to the number of actual characters than the previous one, but there are so called combining characters that can still yield a greater number than the actual characters drawn on-screen.
walkLength(byGrapheme(string)): Returns the number of actual logical characters. This is what a human would see, so you'd use this whenever an actually rendered text is involved.

So for just testing the correctness of the input, using string.length and then checking that each char is within the range of allowed characters is sufficient. But to get a proper error message for the user, you'd have to use the third option, even though the second will seem to yield correct results most of the time (at least on Windows or Linux...).

Re: Check BsonObjectID length from a String

Permalink: HTTP NNTP

Chuck

Posted Tue, 21 Jul 2015 06:34:55 GMT in reply to Sönke Ludwig

On Sun, 19 Jul 2015 06:49:21 GMT, Sönke Ludwig wrote:

On Sat, 18 Jul 2015 04:45:21 GMT, Chuck wrote:

On Sat, 18 Jul 2015 00:26:47 GMT, Taylor Gronka wrote:

On Sat, 11 Jul 2015 20:58:45 GMT, Chuck wrote:

Is there a good way to check a string for the proper length when I convert it to an Bson objectID. The object ID length must be 24 characters, which works fine when using ASCII characters, but breaks if someone adds an accented letter (e.g. é), as this takes up two characters. I know that I can use regular expressions to halt the program if an accented letter (or nonASCII-style letter) is found, but is there a more proper way to do this?

When I do an ID search for BsonObjectID.fromString("55981e8fbea1ebc0dba665a8"), it works fine.
When I do an ID search that has an accented letter, 55981é8fbea1ebc0dba665a8, it breaks.

DLang has 3 types for characters and therefore strings, if I'm not mistaken. dchar, wchar, dstring, and wstring are probably the types you need to use.

I haven't used them myself yet, so I'm not sure which one you'll want to use, but this example uses dstring for the character you're specifying:
http://ddili.org/ders/d.en/strings.html

Thank you very much. That's what I was looking for.

Usually you should get an exception within BsonObjectID.fromString that you can catch to handle malformed ID strings. Does that happen in your case, or do you get some other kind of breakage?

Checking the length can happen on three levels. In this case, since only ASCII letters are allowed, they will all yield the same value for correct ID strings:

string.length: Returns the number of bytes the string makes up. This will be greater (for non-ASCII) or equal (for ASCII) than the numbers below.

walkLength(string) (in std.algorithm): Returns the number of dchars (UTF-32 code points). This is more often equivalent to the number of actual characters than the previous one, but there are so called [combining characters][1] that can still yield a greater number than the actual characters drawn on-screen.

walkLength(byGrapheme(string)): Returns the number of actual logical characters. This is what a human would see, so you'd use this whenever an actually rendered text is involved.

So for just testing the correctness of the input, using string.length and then checking that each char is within the range of allowed characters is sufficient. But to get a proper error message for the user, you'd have to use the third option, even though the second will seem to yield correct results most of the time (at least on Windows or Linux...).

[1]: https://en.wikipedia.org/wiki/Combining_character

The solution I came up with used both string.length and .isAlphaNum. I have had no further issues. Before, the program halted with the output stating invalid objectid. Breakage would also occur when the string contained a character that was not hex (e.g. something that was not a number or was alphabetic past 'f'), stating that it was an invalid hex string.

My solution:

bool validateID(string id)
{
  bool result = true;
  for (int i=0; i < id.length; ++i)
  {
    if (!id[i].isAlphaNum || id[i] > 'f')
    {
      result = false;
    }
  }
  if (result is true && id.length == 24)
  { 
    return true;
  } else {
    return false;
  }
}