std_data_json and vibe.d

Permalink: HTTP NNTP

Posted Thu, 13 Nov 2014 11:15:24 GMT

Hi Sönke, I thought I'd post this question here rather than the github project as this concerns both the vibe.d project and the stddatajson.

Since my last post on the JSON library in vibe.d I have had a good look at the stddatajson library. I note that you have not changed much in it of late, but I also note that most functionality seems to work well and it's well documented. Do you currently have a list of outstanding features or issues, or are you pretty happy with it? If you do have anything that needs to be done, is it worth you posting the missing features or outstanding issues on github?

It does feel more like a standard library, as I know it is intended to become one. One great loss from the vibe.d version is the opDispatch overload. The other being the rather verbose get syntax (e.g. obj.get!(JSONValue[string])["a"]) You mentioned that these were due to the limitations of Algebraic. Is this something you feel that needs to be improved before using this on a larger scale or do you think that it will stay as it is unless Algebraic is potentially altered or improved?

Are your thoughts for this library to drop it into vibe.d at some point, write a serializer for the type and therefore turn it into an (almost) drop in replacement for the existing library?

Lastly, the Bson type in vibe.d looks as is if started it's life from the Json object (indeed there are a few places in the documentation that say Json when it means Bson) I feel like the Json object has had a fair bit of work done on it, leaving the Bson object a little out of date. We have worked around this by doing most of our query building in Json as it translates well enough to Bson anyway. Do you have any thoughts on this? Do you see yourself (or any of us) replacing the vibe.d Bson object with one based on stddatajson?

Apologies for the essay, I'm really interested in where this is going so as to know the best place to work and contribute.

Cheers,

David.

Re: std_data_json and vibe.d

Permalink: HTTP NNTP

Etienne Cimon

Posted Thu, 13 Nov 2014 15:10:48 GMT in reply to David Monagle

On Thu, 13 Nov 2014 11:15:24 GMT, David Monagle wrote:

Apologies for the essay, I'm really interested in where this is going so as to know the best place to work and contribute.

I can't say much about the direction it's taking, but there's a little bit of room for improvement using buffers, mostly for BIG json documents (> 1MB).

There's currently no way of having access to part of the schema during the transfer, which is useful to stop transfer if the metadata is wrong (e.g. mixing in a session ID with Json data that contains a base64 image)
There's possibilities for a memory attack with Json documents that exceed a certain size. Currently, the entire document piles up in the RAM.

I have a "mix" of solutions that also enable other openings ie. in the database world. This involves using Json as a schema, with the values being light integers to file offsets. The Json object would query a disk file for the string data, and throw an error if the schema has too many keys. If you add a small log writer / reader for the BigJson modifications, you get a nice little embedded database.

Other than that, anything related to the standards seems nicely implemented.

Re: std_data_json and vibe.d

Permalink: HTTP NNTP

Sönke Ludwig

Posted Tue, 18 Nov 2014 17:49:45 GMT in reply to David Monagle

Am 13.11.2014 12:15, schrieb David Monagle:

Hi Sönke, I thought I'd post this question here rather than the github project as this concerns both the vibe.d project and the stddatajson.

Since my last post on the JSON library in vibe.d I have had a good look at the stddatajson library. I note that you have not changed much in it of late, but I also note that most functionality seems to work well and it's well documented. Do you currently have a list of outstanding features or issues, or are you pretty happy with it? If you do have anything that needs to be done, is it worth you posting the missing features or outstanding issues on github?

If I remember right, the last thing I was working on was the support for optional input string UTF validation. The implementation was done, but I was still missing unit tests. Once that is done, the library could enter the official review process.

It does feel more like a standard library, as I know it is intended to become one. One great loss from the vibe.d version is the opDispatch overload. The other being the rather verbose get syntax (e.g. obj.get!(JSONValue[string])["a"]) You mentioned that these were due to the limitations of Algebraic. Is this something you feel that needs to be improved before using this on a larger scale or do you think that it will stay as it is unless Algebraic is potentially altered or improved?

Using opDispatch, IMO, was an early mistake on my side. While it is definitely convenient, it also has two major drawbacks. First and formost, it makes any additional method/property or UFCS "method" for JSONValue a breaking change. And second, in the context of a statically compiled language, it can give the false expression of accessing a name checked member instead of a dynamic dictionary key. This can help to obfuscate bugs. IMO, the way to go is to facilitate the use of a serialization solution and store all in-memory data in actual D structures instead of JSONValue, in most cases.

I don't remember exactly, but I think the fixes for the verbose syntax should make their way into DMD 2.067, so those shouldn't be an issue, should the library should get acceppted at some point.

Are your thoughts for this library to drop it into vibe.d at some point, write a serializer for the type and therefore turn it into an (almost) drop in replacement for the existing library?

The practical solution will probably involve adding an implicit cast of the new JSONValue type to the old vibe.data.json.Json type and then later deprecating+removing Json. There will also definitely be a matching serializer implementation, so that everything is indeed supposed to be a drop-in replacement.

Lastly, the Bson type in vibe.d looks as is if started it's life from the Json object (indeed there are a few places in the documentation that say Json when it means Bson) I feel like the Json object has had a fair bit of work done on it, leaving the Bson object a little out of date. We have worked around this by doing most of our query building in Json as it translates well enough to Bson anyway. Do you have any thoughts on this? Do you see yourself (or any of us) replacing the vibe.d Bson object with one based on stddatajson?

I'd probably directly aim for a std.data.bson module, but that was my plan. The new module would also use an Algebraic to store the data and would also provide range based parser akin to the stddatajson one.

Apologies for the essay, I'm really interested in where this is going so as to know the best place to work and contribute.

No problem, excuse my late response (well, this one actually isn't as late as others , I'll still be extremely busy until early next year. I'l try to finalize the UTF validation feature though, so that the review process can start anyway.

Re: std_data_json and vibe.d

Permalink: HTTP NNTP

Sönke Ludwig

Posted Tue, 18 Nov 2014 18:52:18 +0100 in reply to Etienne Cimon

Am 13.11.2014 16:10, schrieb Etienne Cimon:

On Thu, 13 Nov 2014 11:15:24 GMT, David Monagle wrote:

Apologies for the essay, I'm really interested in where this is going so as to know the best place to work and contribute.

I can't say much about the direction it's taking, but there's a little bit of room for improvement using buffers, mostly for BIG json documents (> 1MB).

There's currently no way of having access to part of the schema during the transfer, which is useful to stop transfer if the metadata is wrong (e.g. mixing in a session ID with Json data that contains a base64 image)

There's possibilities for a memory attack with Json documents that exceed a certain size. Currently, the entire document piles up in the RAM.

I have a "mix" of solutions that also enable other openings ie. in the database world. This involves using Json as a schema, with the values being light integers to file offsets. The Json object would query a disk file for the string data, and throw an error if the schema has too many keys. If you add a small log writer / reader for the BigJson modifications, you get a nice little embedded database.

Other than that, anything related to the standards seems nicely implemented.

There is the range based parser, which doesn't store anything other than
the current parser node, so it is possible to skip parts of a big
document. However, string values are always stored in-memory (could be a
slice to a memory mapped buffer), so large Base-64 strings for example
could still be an issue.