8 comments

  • mofeing 11 days ago
    hey,

    We were running with the same problem (supercomputer with clusters of different architecture and no outgoing connections permitted) and so we created "pypickup" [1,2]. nice to see that we came with similar solutions! I have some questions:

    1. is the directory of packages you create compatible with the PEP 503? (so I can use `--index-url file://PATH_TO_LOCAL_CACHE` flat with pip and it should work)

    2. is there some filtering mechanism? e.g. we are not interested in non-release versions ("dev" versions, "rc" versions, "post" versions, ...)

    3. I guess that the way morgan resolves dependencies is by manually parsing files like "pyproject.toml" or "requirements.txt" and it does not ask the build-system for the dependencies. if so...

       - does "morgan" detect build-dependencies?
    
       - which build-systems are compatible?
    
       - is "morgan" capable of detecting more complex dependency specifications? e.g. "oldest-supported-numpy" which is used by "spicy" has dependency strings like the following: numpy==1.19.2; python_version=='3.8' and platform_machine=='aarch64' and platform_python_implementation != 'PyPy'
    
    kudos for the good work

    [1] https://pypi.org/project/pypickup/ [2] https://github.com/UB-Quantic/pypickup

    • idop 11 days ago
      Too bad your project didn't come up in any of my searches while researching this problem. Probably because it doesn't use the word "mirror" at all :)

      As for your questions:

      1. I don't see any mention of directory structures in PEP 503. The Morgan server does implement PEP 503 though. In any case, I tried installing now straight from the directory and it didn't work. Are you sure you meant PEP 503?

      2. Where Morgan differs from pypickup, as I can see, is that it interprets requirement strings as per PEP 508 (e.g. "requests>=2.40.0; python_version < '3.8'") instead of providing a command such as `pypickup add requests`. For every requirement string, it looks for the latest version in PyPI that satisfies it, and downloads that version. You can filter _in_ the requirement strings, other than that Morgan doesn't have any specific handling of dev/rc/etc.

      3. Morgan detects and downloads the build system based either on the [build-system] section of pyproject.toml, or the setup_requires.txt file (from setuptools). These are the sources currently supported. It doesn't actually care what the build system is, it simply attempts to find where it is defined and download it as well.

      As for complex dependency specifications, yes, they are supported and honored (Morgan relies on the "packaging" library to properly evaluate those). By the way, I recently moved from Poetry to Hatch for managing the Morgan project itself specifically because I got fed up with Poetry not honoring those specifications, and trying to download completely irrelevant packages.

      • mofeing 11 days ago
        Well, we first named it "pypi-cache" but there is a package named "pypicache" from the year 2007 and we had to rename it. We always thought of it as a "cache" rather than a "mirror"... but yes, "mirror" is more appropriate. Btw we released it just 1 week ago which is also maybe why you did not find it.

        1. Well, the flag "--index-url" explicitly says that "... should point to a repository compliant with PEP 503 (the simple repository API) or a local directory laid out in the same format". PEP 503 defines the directory structure where there is a folder per package, an "index.html" on the root with a link to each package and *an "index.html" in each package folder that has a link per available file*.

        URLs are not limited to "https", they can also be relative paths. So the trick we do is to download the file to the folder of the package and add an anchor to that file in the "index.html" of the package. For example,

        If you go to https://pypi.org/simple/numpy, you will find links like the following: <a href="https://files.pythonhosted.org/packages/f6/d8/ab692a75f584d1..." data-requires-python=">=3.8">numpy-1.22.4.zip</a>

        But we download it and write, <a href="./numpy-1.22.4.zip" data-requires-python=">=3.8">numpy-1.22.4.zip</a>

        This is specially important for us because we cannot setup any kind of server.

        2. Okay nice. Yep, we thought that parsing would be more difficult and that relying on parsing would be problematic due to the different build-systems and that many packages still do not have the "pyproject.toml" file. We opted for a manual approach in which you do "pypickup add" until you have no more "dependency missing" errors. Your approach looks much better to me, but like you said is limited to "pyproject.toml" and "setuptools" right now.

        Btw, does it also downloads extra dependencies?

        3. Nice. I also stopped using Poetry for things like that, but now I manually write my "pyproject.toml" with "setuptools".

        I like the idea on trying to parse the dependencies. I will probably try something but since we download all files (filtering some of them), it would be more costly. Maybe in some weeks when I'm more free.

        • idop 11 days ago
          Ahh, I get it, it needs index.html files. I can easily implement this, but I actually did want the server because I wanted it to be easily accessible from multiple machines, I also wanted to implement the JSON API, and also want (in an upcoming version) to allow uploading private packages to the mirror.

          As for extra dependencies, yes, they will be mirrored, but only if relevant, i.e. if they are included in a requirement string (be it a direct requirement or a dependency of a dependency).

          • mofeing 10 days ago
            Ahh ok. In our case all the machine have a shared network filesystem where we store the mirror.

            Great about the extras.

            Would you mind if we reference each other in the readmes?

            • idop 10 days ago
              Yeah sure, no problem.
  • Galanwe 11 days ago
    Maybe I'm confused about what this offers, but I have been running private pypi repositories for a decade now, and it never required more than running an HTTP server with directory listing.

    As for doing partial mirroring of pypi with only what you are using, is that really a good idea anyway? it will break whenever you add or change any dependency.

    • idop 11 days ago
      The problem isn't really on the serving side, it's on the mirroring side. Trying to mirror PyPI - at its current 13.4 TB size[1] - and bringing all those terabytes into a restricted network with security policies and no access to the internet, is impossible. Partial mirror is the only way to go for such a use case, and given that Morgan automatically resolves and mirrors dependencies, adding new dependency shouldn't break anything.

      [1] https://pypi.org/stats/

      • vasco 11 days ago
        Can't you resolve the dependencies by running pip download when you have internet and later serving that directory with a local HTTP server as the parent suggested? Pip download will resolve all the dependencies for you already the same way as pip install would.
        • idop 11 days ago
          No, as I mention both in this post and in the README. Pip will download binary distributions (wheels) that were compiled for the system it is running on. If my mirror is meant to serve a different version of Python installed on a different OS with a different libc (or other such differences), then it won't work. I could try to match the target environment on the mirroring side, say with Docker, but this is either cumbersome or still not possible if you have legacy environments from years before.
        • nijave 11 days ago
          You can download as source packages instead of wheels but then you need to make sure you have all the requisite compilers and libraries needed. This isn't an issue for Python-only dependencies but can be difficult for dependencies with lots of native code like numpy/pandas where you need a C toolchain & Fortran toolchain installed (and possibly other libs)

          If you're using something like Docker/containers, you can download the dependencies inside the container and be reasonably sure you get the right wheels. This becomes trickier when you have different setups like developers on Windows and production on Linux.

      • 5d8767c68926 11 days ago
        Now I am curious.

        - how big is it if you exclude non-Python3 compatible? - how big if you only wanted the latest version of everything?

    • jamescampbell 11 days ago
      Came here to say this. I run private pypi repositories for this use case and it works fine. Ive had to thumbdrive over all of our dependancies from the wheels etc. A single bash script that runs all the checks and downloads and zips to the offline environment then use your pip install like normal with the login creds to your offline pypi registry.
    • colpabar 11 days ago
      Out of curiosity, how do you run yours?
  • hackish 11 days ago
    Thanks for posting this. I'm going to give setting up Morgan a shot when I've got some free cycles.

    I'd hesitantly accepted the risk of serving a devpi server over vsock and into my (personal) restricted VLAN. I did so because using a shared folder meant I'd need have cached the module and any dependencies from my internet-connected VLAN first.

    Combined with debmirror[0], vscodeoffline[1], and some nightly snatcher shell scripts, I think I have most of my needs covered.

    [0] https://help.ubuntu.com/community/Debmirror

    [1] https://github.com/LOLINTERNETZ/vscodeoffline

  • uranusjr 8 days ago
    This is pretty cool. I created simpleindex[1] a while ago to solve a different problem, but since the solution is essentially also running a custom index server, it has several overlapping functionalities to Morgan’s server script. I wonder if there’s a common pattern that can be extracted out…

    BTW I also maintain resolvelib (mentioned in another comment), feel free to shoot any questions in the issue tracker or the PyPA Discord[2], or any other means. The documentation is a bit sparse and there are not many resources on dependency resolution in general, and there’s a few of us that help each other out on things.

    [1]: https://github.com/uranusjr/simpleindex [2]: https://discord.com/invite/pypa

  • jvolkman 11 days ago
    This looks similar to some Bazel rules I'm working on. I'm also using the approach of defining target environments up front [1], but the main difference is that I'm currently offloading the actual resolution process to Poetry or PDM, which both generate cross-platform lock files.

    But Poetry and PDM don't add build dependencies to lock files - which I need - so I'm thinking of building a custom resolver.

    Did you consider using resolvelib [2], which is what underlies both pip and PDM?

    [1] https://github.com/jvolkman/rules_pycross/blob/main/examples...

    [2] https://github.com/sarugaku/resolvelib

    • idop 11 days ago
      By the way, Poetry's dependency resolution isn't that great. It doesn't properly evaluate optional dependencies. For example, when I try to install pymongo on Linux, it will insist on installing pywin32 as well, even though it is completely irrelevant. It's given me a lot of headaches.
    • idop 11 days ago
      I didn't know about resolvelib, looks interesting, I'll have to give it a deeper look, thanks.
  • skbly7 11 days ago
    Thanks for creating it and looking forward to try it out.

    I have been looking for similar solution and the whitelist used to fail with other tools as they weren't resolving the dependencies.

  • danrocks 11 days ago
    When I worked at Microsoft, one team created a big solution for an e-commerce customer using Kubernetes, Helm charts, etc. Beautiful.

    Then I had to take it to run in mainland China.

    Nope.

  • indrora 11 days ago
    Oh neat. Not only do I share a name with a project, it's a project I was seriously thinking of starting.
    • idop 11 days ago
      :) Naming projects is hard, so I tend to give it as little thought as possible. I was playing Red Dead Redemption 2 while writing the first version of this so I just named it after Arthur Morgan, the main protagonist.
      • arthurcolle 11 days ago
        I figured it was someone lamenting working at Morgan Stanley for not letting you pull in dependencies without a lot of red tape ;)
        • coredog64 11 days ago
          When I was at Morgan I saw three or four people create something like this. :)