Should packages be restricted to a single name space?

Pages: 1 2 3 4

Re: Should packages be restricted to a single name space?

Permalink: HTTP NNTP

Posted Sun, 10 Mar 2013 21:14:42 GMT in reply to Robert

After reading the latest proposal, I'm a bit surprised that it works exactly (except for the new :slot concept) like I thought and my assumptions are actually true. So in particular in my example the following things would be illegal:

the main package must not be named "vibe-d", but just "vibe" (bad that it is already named "vibe-d")
vibe-core may not have that name because vibe.inet and vibe.utils are also contained in it. In fact there is no way to group these name spaces together in one (really just one) package. Note that I'm not talking about repositories here.
vibe-ext has the the same problems as vibe-core

So I would have to actually create one package for each of vibe.* - all of them would be publicly visible in the registry, even if using just vibe.inet or vibe.utils makes not much sense. Moreover I discovered that there are a few cyclic dependencies between some modules (yeah, those should be fixed) - how would I handle those? Allow cyclic dependencies for packages? Seems like no good idea on that level.

I have a few comments on the new proposal, but before going further with this, I want to ask again (to use Andrei's words): Does it pull its own weight?

Frankly, I see no problem that would hinder a system to exist that can sufficiently represent the existing library landscape and at the same time enforce that package name/name space restriction in some way. And your solution (although I still see some issues) may well be able to do that or be extended to do that. But that is not the important question.

The real question is, what is better:

Relatively complex(*) system, but (also just partially) enforced rules
Simple system with just convention

(*) And I don't (just) mean implementation complexity

Also, if you think about it, the :slot solution is basically not much different than allowing arbitrary names, just that there is a prefix that coincides with the name space.

But really, before going further into details, I really want to have basis to estimate how much we'd actually (in practice) gain from the whole thing. I have the feeling that all of this solves a problem that doesn't exist in the first place (i.e. I want a proof that these conflicts, not counting forks of the same library, actually happen in reality, in any other package system that doesn't have such rules).

Sorry, I really don't want to simply turn this down, but my gut feeling that all this may do more harm than good has actually grown, even though some issues are resolved by your latest proposal. I want be sure that this is not just wasting time because some precondition was wrong.

Re: Should packages be restricted to a single name space?

Permalink: HTTP NNTP

Robert

Posted Mon, 11 Mar 2013 13:05:52 GMT in reply to Sönke Ludwig

On Sun, 10 Mar 2013 21:14:42 GMT, Sönke Ludwig wrote:

After reading the latest proposal, I'm a bit surprised that it works exactly (except for the new :slot concept) like I thought and my assumptions are actually true. So in particular in my example the following things would be illegal:

the main package must not be named "vibe-d", but just "vibe" (bad that it is already named "vibe-d")

True.

vibe-core may not have that name because vibe.inet and vibe.utils are also contained in it. In fact there is no way to group these name spaces together in one (really just one) package. Note that I'm not talking about repositories here.

vibe-ext has the the same problems as vibe-core

So I would have to actually create one package for each of vibe.* - all of them would be publicly visible in the registry, even if using just vibe.inet or vibe.utils makes not much sense. Moreover I discovered that there are a few cyclic dependencies between some modules (yeah, those should be fixed) - how would I handle those? Allow cyclic dependencies for packages? Seems like no good idea on that level.

If you want to sub-package your vibe package exactly this way, probably, except for those sub D packages which are included in the vibe package itself.

Cyclic dependencies always sound bad, but in this case I don't know what the actual problem of cyclic dependencies is? All it means is that you always get both packages, just like if it was a single one. Reasonable argument: "Why not just make it a single one?" My question: "Why does it need to be a single one?" If this in practice is not the exception, but the rule then of course my proposal sucks! :-)

I have a few comments on the new proposal, but before going further with this, I want to ask again (to use Andrei's words): Does it pull its own weight?

Frankly, I see no problem that would hinder a system to exist that can sufficiently represent the existing library landscape and at the same time enforce that package name/name space restriction in some way. And your solution (although I still see some issues) may well be able to do that or be extended to do that. But that is not the important question.

The real question is, what is better:

Relatively complex(*) system, but (also just partially) enforced rules

Simple system with just convention

(*) And I don't (just) mean implementation complexity

Also, if you think about it, the :slot solution is basically not much different than allowing arbitrary names, just that there is a prefix that coincides with the name space.

Exactly and this nice property makes it straight forward to see what name spaces are already claimed by just looking at available packages. It also makes it easy (for both tools and humans) to find the package belonging to some import. Conflicts can easily be avoided and especially can not be introduced by accident and if they occur they are already detected by the package manager and don't simply cause the compiler to use the wrong module(s) silently.

About complexity: One nice property of my proposal is, that creating multiple packages from a single repository is as easy as specifying a name, which causes the corresponding D package to be a package-manager package.

Do we want repositories to export multiple packages in the future? And if yes, how would you do it?

I am of course biased, but I still don't consider the proposal to be complex, rather straight forward. Of course, good reasons why one would like to have a package-manager package which is not a single D-package, would cause the proposal to be doomed. Your example based on vibe, is in fact quite a good one, if having non standalone packages and cyclic dependencies are considered a problem.

Having said this, as I don't know how to enhance this proposal any further, neither on a technical basis nor on a marketing/explanatory basis, I reached a point where I can say if I haven't convinced you yet and users don't like it either, I am good to drop it and accept that D-packages don't map as nicely to package-manager packages as I thought they would.

The proposal builds on the assumption that the D-package is the natural way of grouping things together, if in practice this is not the case and it is frequent that you want to group things differently, then this proposal is in fact just making things more complex, instead of making them easier and should by all means be dropped.

Best regards,

Robert

Re: Should packages be restricted to a single name space?

Permalink: HTTP NNTP

Sönke Ludwig

Posted Mon, 11 Mar 2013 19:57:34 GMT in reply to Robert

On Mon, 11 Mar 2013 13:05:52 GMT, Robert wrote:

vibe-core may not have that name because vibe.inet and vibe.utils are also contained in it. In fact there is no way to group these name spaces together in one (really just one) package. Note that I'm not talking about repositories here.

vibe-ext has the the same problems as vibe-core

So I would have to actually create one package for each of vibe.* - all of them would be publicly visible in the registry, even if using just vibe.inet or vibe.utils makes not much sense. Moreover I discovered that there are a few cyclic dependencies between some modules (yeah, those should be fixed) - how would I handle those? Allow cyclic dependencies for packages? Seems like no good idea on that level.

If you want to sub-package your vibe package exactly this way, probably, except for those sub D packages which are included in the vibe package itself.

Cyclic dependencies always sound bad, but in this case I don't know what the actual problem of cyclic dependencies is? All it means is that you always get both packages, just like if it was a single one. Reasonable argument: "Why not just make it a single one?" My question: "Why does it need to be a single one?" If this in practice is not the exception, but the rule then of course my proposal sucks! :-)

I see a number of issues with the build process and cyclic dependencies. As one example, just assume that both such dependencies are compiled as a shared library and think about how linking will work.

I have a few comments on the new proposal, but before going further with this, I want to ask again (to use Andrei's words): Does it pull its own weight?

Frankly, I see no problem that would hinder a system to exist that can sufficiently represent the existing library landscape and at the same time enforce that package name/name space restriction in some way. And your solution (although I still see some issues) may well be able to do that or be extended to do that. But that is not the important question.

The real question is, what is better:

Relatively complex(*) system, but (also just partially) enforced rules

Simple system with just convention

(*) And I don't (just) mean implementation complexity

Also, if you think about it, the :slot solution is basically not much different than allowing arbitrary names, just that there is a prefix that coincides with the name space.

Exactly and this nice property makes it straight forward to see what name spaces are already claimed by just looking at available packages. It also makes it easy (for both tools and humans) to find the package belonging to some import. Conflicts can easily be avoided and especially can not be introduced by accident and if they occur they are already detected by the package manager and don't simply cause the compiler to use the wrong module(s) silently.

But if the registry could instead just tell you which namespace(s) a package occupies or vice versa, you have a very similar effect (just that the user would have too look at a namespace list instead of a package list to see if a particular namespace is already occupied).

About complexity: One nice property of my proposal is, that creating multiple packages from a single repository is as easy as specifying a name, which causes the corresponding D package to be a package-manager package.

Do we want repositories to export multiple packages in the future? And if yes, how would you do it?

I am of course biased, but I still don't consider the proposal to be complex, rather straight forward. Of course, good reasons why one would like to have a package-manager package which is not a single D-package, would cause the proposal to be doomed. Your example based on vibe, is in fact quite a good one, if having non standalone packages and cyclic dependencies are considered a problem.

The length of the proposal gives a hint of what I mean with complexity. Even if some of it is just examples and some parts are specific to our discussion, it still contains a number of new concepts that every user has to understand on top of what is already there.

Having said this, as I don't know how to enhance this proposal any further, neither on a technical basis nor on a marketing/explanatory basis, I reached a point where I can say if I haven't convinced you yet and users don't like it either, I am good to drop it and accept that D-packages don't map as nicely to package-manager packages as I thought they would.

The proposal builds on the assumption that the D-package is the natural way of grouping things together, if in practice this is not the case and it is frequent that you want to group things differently, then this proposal is in fact just making things more complex, instead of making them easier and should by all means be dropped.

I really would prefer such a system, where there is a clean relationship between package and name space, too. But at this point I would consider the issues that it creates plus the still unknown benefit as more serious. The fact that we haven't seen an existing system that works like this is another sign of warning.

But I think that just storing a list of occupied name spaces for each package in the registry will lead to nearly the same practical result even without the restriction, so things should not be much worse.

(Off topic: Did you send any mail today or yesterday by chance? I'm asking because my server had a harddrive failure after a power outage and I had to use a one day old backup)

Re: Should packages be restricted to a single name space?

Permalink: HTTP NNTP

Robert

Posted Tue, 12 Mar 2013 00:23:15 GMT in reply to Sönke Ludwig

I see a number of issues with the build process and cyclic dependencies. As one example, just assume that both such dependencies are compiled as a shared library and think about how linking will work.

I actually thought of this. The solution I had in mind was, as that packages with cyclic dependencies are virtually an inseparable entity they just would be a single shared library, if distributed, then in a single zip file, with both package names referring to the same file. (It would probably make sense to restrict cyclic dependencies to packages stemming from the same repository.) So issues are solvable, but it is really not worth the trouble if people don't package according to their D packages on a regular basis, it would only be acceptable if this was only necessary in some weird corner cases where someone needs to stay absolutely compatible with some legacies or something. The fact, that already you want to package your software not according to your D packages, despite knowing all my great good reasons why this is a good idea, is already proving that my proposal is not going to work for a public repository despite all its nice properties :-(

But if the registry could instead just tell you which namespace(s) a package occupies or vice versa, you have a very similar effect (just that the user would have too look at a namespace list instead of a package list to see if a particular namespace is already occupied).

Could be done, the thing is, with my proposal it would not have been necessary. Also to offer a reliable possibility to check for conflicts, the registry has to keep a database of all file names with information about which package in what version(s) installs it. This database could then also be used to find out what package installs a particular import and also my "easy scripting" feature would make use of it.

It is more work and less straight forward, but with a good infrastructure and the right tools it would not be too bad. I guess this is what computers are there for.

The length of the proposal gives a hint of what I mean with complexity. Even if some of it is just examples and some parts are specific to our discussion, it still contains a number of new concepts that every user has to understand on top of what is already there.

The basic concept is a one-liner: "Create a package by specifying its name, the corresponding D package in your source tree will be its contents."
If you know this you would already be set for most packages, another sentence about child packages and you are basically done. The document is long, because I am going quite a bit into details, including sample package.json files and because I am trying to prove my point, explaining why I do that and why I think it is a good idea. But ok I am biased, if it seemed to be complex to me, I would have dropped it before writing it even down.

I really would prefer such a system, where there is a clean relationship between package and name space, too. But at this point I would consider the issues that it creates plus the still unknown benefit as more serious. The fact that we haven't seen an existing system that works like this is another sign of warning.

But I think that just storing a list of occupied name spaces for each package in the registry will lead to nearly the same practical result even without the restriction, so things should not be much worse.

Except that we would need to store a list of all modules not only namespaces, as packages are no longer restricted to a particular namespace.

I have to think a bit more about the consequences of not implementing my proposal, but with the right infrastructure and tools available, most issues should be solvable. E.g. having an IDE showing me the containing package name when I hover over an import line, having a commit hook that calls dub with some parameter that checks for conflicts with the registry, ...

Just let me say it one more time, that this all wasn't necessary if only .... :-) Ok, I had to do it, it is hard giving up a baby.

(Off topic: Did you send any mail today or yesterday by chance? I'm asking because my server had a harddrive failure after a power outage and I had to use a one day old backup)

No, nothing. :-)

Re: Should packages be restricted to a single name space?

Permalink: HTTP NNTP

Sönke Ludwig

Posted Tue, 12 Mar 2013 08:45:46 GMT in reply to Robert

Am 12.03.2013 01:23, schrieb Robert:>>

I see a number of issues with the build process and cyclic dependencies. As one example, just assume that both such dependencies are compiled as a shared library and think about how linking will work.

I actually thought of this. The solution I had in mind was, as that packages with cyclic dependencies are virtually an inseparable entity they just would be a single shared library, if distributed, then in a single zip file, with both package names referring to the same file. (It would probably make sense to restrict cyclic dependencies to packages stemming from the same repository.)

No no no! This going severely in the wrong direction. Too much magic and this is also not the only problem with cyclic dependencies. Patching like this will open up more holes and cause more and more special cases to be added (well, maybe there is an end to it somewhere).

So issues are solvable, but it is really not worth the trouble if people don't package according to their D packages on a regular basis, it would only be acceptable if this was only necessary in some weird corner cases where someone needs to stay absolutely compatible with some legacies or something. The fact, that already you want to package your software not according to your D packages, despite knowing all my great good reasons why this is a good idea, is already proving that my proposal is not going to work for a public repository despite all its nice properties :-(

Not "want", I appreciate the positive properties, but the fact that even I am still running into walls and more special cases would need to be added, plus the other mentioned issues, could just be a sign that this restriction may just not be a good idea after all. I hope that you can wake up one day and suddenly you realize how these restrictions have just restrained your life up to that point ;)

But seriously, at this point I could just repeat my previous mails about complexity, actual benefit (still nothing in sight) and judgment calls.

But if the registry could instead just tell you which namespace(s) a package occupies or vice versa, you have a very similar effect (just that the user would have too look at a namespace list instead of a package list to see if a particular namespace is already occupied).

Could be done, the thing is, with my proposal it would not have been necessary. Also to offer a reliable possibility to check for conflicts, the registry has to keep a database of all file names with information about which package in what version(s) installs it. This database could then also be used to find out what package installs a particular import and also my "easy scripting" feature would make use of it.

Well, in your case you also have to make sure that all packages adhere to the restrictions, you have to manage those :suffixes, implement all the new concepts like sub-packages first and whatnot. Just adding a list of D packages for each package record really is trivial and has the nice property that it can be done at a later point when needed for the scripting feature or when it shows that people create unwanted conflicts.

It is more work and less straight forward, but with a good infrastructure and the right tools it would not be too bad. I guess this is what computers are there for.

I totally disagree. This is far less work and much more straight forward. Try to list up all the things that are necessary to implement for your proposal. If you are careful, you'll see that there is actually a lot of work in it.

The impact on the required hardware also is just not there. The storage requirements are negligible and the expensive part of checking the actually occupied name spaces of each package also has to be done in the restricted case.

The length of the proposal gives a hint of what I mean with complexity. Even if some of it is just examples and some parts are specific to our discussion, it still contains a number of new concepts that every user has to understand on top of what is already there.

The basic concept is a one-liner: "Create a package by specifying its name, the corresponding D package in your source tree will be its contents."
If you know this you would already be set for most packages, another sentence about child packages and you are basically done. The document is long, because I am going quite a bit into details, including sample package.json files and because I am trying to prove my point, explaining why I do that and why I think it is a good idea. But ok I am biased, if it seemed to be complex to me, I would have dropped it before writing it even down.

(I'll repeat my previous answer in response to your length argument: "Even if some of it is just examples and some parts are specific to our discussion, it still contains a number of new concepts that every user has to understand on top of what is already there.")

...and then you would have to start explaining the :slots and how the default is dealt with, reason about how cyclic dependencies are resolved with that special case, and start explaining how these help to adapt existing structures to the the seemingly simple core concept.

It is simple, the core concept, it's fitting things into a model that they weren't made for that makes things complex.

I really would prefer such a system, where there is a clean relationship between package and name space, too. But at this point I would consider the issues that it creates plus the still unknown benefit as more serious. The fact that we haven't seen an existing system that works like this is another sign of warning.

But I think that just storing a list of occupied name spaces for each package in the registry will lead to nearly the same practical result even without the restriction, so things should not be much worse.

Except that we would need to store a list of all modules not only namespaces, as packages are no longer restricted to a particular namespace.

Well, depending on the use case that may be the case, or may be not (you can only consider actual module clashes as a conflict, or you can already count package clashes as conflicts). Anyway, I fail to see the problem, even when capturing all modules.

I have to think a bit more about the consequences of not implementing my proposal, but with the right infrastructure and tools available, most issues should be solvable. E.g. having an IDE showing me the containing package name when I hover over an import line, having a commit hook that calls dub with some parameter that checks for conflicts with the registry, ...

Sorry, but that's a totally unrealistic case you make up there. Please, show me where such conflicts happen in reality and we can talk about it, but constructing cases like this is just... after all we have concrete issues with the restricted case but still not one concrete example of issues with the unrestricted case (and there is a sheer unlimited set of examples in the world).

People will still usually adhere to only one root package per package manager package - it should also still be the recommendation to name the D package after it's root package. And if some maniac decides to place his modules all over the global name space, surely the library will not be used and no one will ever notice the conflicts, starting to look for per module for conflicts is really over the top, IMO.

Just let me say it one more time, that this all wasn't necessary if only .... :-) Ok, I had to do it, it is hard giving up a baby.

I hear you, I used to see that clean package tree right before my eyes :)

...but then reality came in and destroyed everything ;)

I think a good analogy may be D vs. Lisp. Everything is so much cleaner when restricting everything to pure functions and that pure syntax, but at some point you realize that for certain things that's really just a burden and that the restriction may not be modeling reality sufficiently.

(Off topic: Did you send any mail today or yesterday by chance? I'm asking because my server had a harddrive failure after a power outage and I had to use a one day old backup)

No, nothing. :-)

Re: Should packages be restricted to a single name space?

Permalink: HTTP NNTP

Robert

Posted Tue, 12 Mar 2013 12:58:30 GMT in reply to Sönke Ludwig

It seems we already have conflicts:

https://github.com/Domain/java/tree/master/java

and

https://github.com/d-widget-toolkit/base/tree/master/src

I think people should at least be aware of such things, when uploading packages, so they can discuss whether the implementations could be merged or how they want to proceed. Having libraries depending on one implementation and others depending on the other one, seems really bad.

Maybe I have a look on how to implement some conflict detection mechanism for the registry/dub? Would such a contribution be welcome? Basically what we already discussed, the registry would need to keep track of files, contained in packages.

Best regards,

Robert

Re: Should packages be restricted to a single name space?

Permalink: HTTP NNTP

Sönke Ludwig

Posted Tue, 12 Mar 2013 15:34:20 GMT in reply to Robert

On Tue, 12 Mar 2013 12:58:30 GMT, Robert wrote:

It seems we already have conflicts:

https://github.com/Domain/java/tree/master/java

and

https://github.com/d-widget-toolkit/base/tree/master/src

I think people should at least be aware of such things, when uploading packages, so they can discuss whether the implementations could be merged or how they want to proceed. Having libraries depending on one implementation and others depending on the other one, seems really bad.

I agree. However, the package==namespace approach would also just show that there is a conflict and not prevent it. The resolution still requires people to restructure their code.

Maybe I have a look on how to implement some conflict detection mechanism for the registry/dub? Would such a contribution be welcome? Basically what we already discussed, the registry would need to keep track of files, contained in packages.

Definitely, especially considering that we need to scan packages for all contained D packages anyway, it will be good to have that in place. The question is how to best implement that performance wise:

Query github for each folder? Or maybe there is a function to return the directory structure?
Download the archived project .zip file, scan it, and delete it again?
Some other way? Could use libgit, but that would limit it to git...

The representation in the database would then just be something like a string[] DbPackage.packages; field that has all D packages in dot notation.

Re: Should packages be restricted to a single name space?

Permalink: HTTP NNTP

Sönke Ludwig

Posted Tue, 12 Mar 2013 16:43:57 +0100 in reply to Sönke Ludwig

Started to add issues for planned features (Captured this as
https://github.com/rejectedsoftware/dub-registry/issues/7)

Re: Should packages be restricted to a single name space?

Permalink: HTTP NNTP

Robert

Posted Tue, 12 Mar 2013 16:38:55 GMT in reply to Sönke Ludwig

Definitely, especially considering that we need to scan packages for all contained D packages anyway, it will be good to have that in place.

Perfect :-) It is time for me to become acquainted with the registry code base!

The question is how to best implement that performance wise:

Query github for each folder? Or maybe there is a function to return the directory structure?

Download the archived project .zip file, scan it, and delete it again?

Some other way? Could use libgit, but that would limit it to git...

Hmm, without very much thinking, using libgit seems to be the best way, writing some abstraction so that the back-end is exchangeable for other vcs should not be hard. Also I have a feeling that this would be the most efficient, but we'll see.

The representation in the database would then just be something like a string[] DbPackage.packages; field that has all D packages in dot notation.

I think just a list of packages is not enough, we need to to store the name of every file to detect conflicts. Two packages installing in the same D-package can be perfectly fine, thus we need to go down to the module level for both conflict detection and mapping source files to packages.

On a different matter, should we also warn about conflicts with -J files?

Best regards,

Robert

Re: Should packages be restricted to a single name space?

Permalink: HTTP NNTP

Sönke Ludwig

Posted Tue, 12 Mar 2013 16:55:20 GMT in reply to Robert

On Tue, 12 Mar 2013 16:38:55 GMT, Robert wrote:

Definitely, especially considering that we need to scan packages for all contained D packages anyway, it will be good to have that in place.

Perfect :-) It is time for me to become acquainted with the registry code base!

The question is how to best implement that performance wise:

Query github for each folder? Or maybe there is a function to return the directory structure?

Download the archived project .zip file, scan it, and delete it again?

Some other way? Could use libgit, but that would limit it to git...

Hmm, without very much thinking, using libgit seems to be the best way, writing some abstraction so that the back-end is exchangeable for other vcs should not be hard. Also I have a feeling that this would be the most efficient, but we'll see.

Yeah, it depends if it has efficient means to list all files. I haven't seen one, but I haven't searched for it either.

The representation in the database would then just be something like a string[] DbPackage.packages; field that has all D packages in dot notation.

I think just a list of packages is not enough, we need to to store the name of every file to detect conflicts. Two packages installing in the same D-package can be perfectly fine, thus we need to go down to the module level for both conflict detection and mapping source files to packages.

I thought about warning just for leaves of the package tree, so e.g. those two would conflict:

somelib.subpack1
somelib.subpack2

somelib.subpack2
somelib.subpack3

But this one wouldn't

somelib.subpack1.subsubpack

Another probably better idea would be to only include packages in the list that actually contain modules. That would be a more conservative approach, where it can still be guaranteed that there are no conflicts, but still it should be sufficiently flexible.

On a different matter, should we also warn about conflicts with -J files?

I tend to say no, because they have a defined override behavior and this is very useful to customize Diet templates of dependencies. A more explicit way would be desirable, but I don't know how that should work without uglifying the API.

Regards, Sönke

Pages: 1 2 3 4