16 comments

  • huydotnet 317 days ago
    Great product! I gave it a try and was really impressed!

    One thing: Could your team have an explanation somewhere on the home page to help the users understand how their private code is being handled (locally and when the chat request happen)?

    The only place I could find that information is on the HN launch thread, would be nice if it's available in the home page!

    • louiskw 317 days ago
      Yes, have been meaning to do this! We’ve been super focussed on the app itself often at the expense of the marketing site.

      Glad to hear you found it helpful, I’m around if you have any issues.

      • brandall10 317 days ago
        I'm not currently working for a larger org... but I could certainly say with my last employer having this explicitly documented up front would be crucial for adoption.
  • anotherpaulg 317 days ago
    This looks super interesting! You mentioned that you'll be sharing some more details on the approach. I'm looking forward to learning more about how you're using the user's query to select relevant code to share with GPT-4.

    I have been working on a related problem for my open source GPT coding tool. I am more focused creating a GPT chat experience that can act as a "junior developer" to write and edit code in your git repo. GPT is great at writing fresh, new self-contained code. I have been trying to improve its ability to make changes to a larger, complex repo.

    I wrote up some notes on my current approach and some ideas for future work. I'll be curious to see how it compares to your approach, which seems more focused on search and code analysis.

    https://aider.chat/docs/ctags.html

    • louiskw 317 days ago
      Very cool. I imagine your approach using files/folders + identifiers would work well for small repos, where everything can fit within an 8k prompt. An early prototype of our agent had something similar but with just files/folders no identifiers.

      Our working thesis atm is the only approach to context that scales to support massive repos is embedding or text based retrieval.

      Code search is going to need to be solved before code editing gets solved. You can't make changes if you can't find where the changes need to be made in a repo. Unless you're willing to make the UX concession that users have to manually select which files need to be edited.

  • wanderingmind 317 days ago
    Looks interesting. A couple of simple questions,

    1. For people with only access to gpt3.5 API, does this still work?

    2. Also does it work for repositories not hosted in github (ex: gitlab, bitbucket)?

    • louiskw 317 days ago
      1. The agent loop runs in GPT4, so the prebuilt binaries come with access to our own OpenAI key (via our hosted proxy)

      2. It works with any git repos, you just have to clone them on your local machine first

  • nickthesick 317 days ago
    This looks really interesting, I wonder if we can get to the point of having this all be local to your machine. The idea to use embeddings to search for relevant code snippets is maybe obvious to some, but just now getting into this stuff, just blew my mind!

    I find that the hardest problems to find are interfaces between projects like a bad frontend call into a backend or backend to backend. I wonder if this could index separate projects & draw links between them

    • louiskw 317 days ago
      Semantic relationships between backend/frontend or microservices are super interesting.

      We’re not far off, for example you if you index bloop itself with bloop and ask “What message does the backend send to indicate the frontend should close the eventsource?” bloop will return a decent answer which takes into account the relevant frontend and backend code.

      This is an active area of improvement.

  • jordn 318 days ago
    What have been some of your learnings for getting agents to work?
    • louiskw 317 days ago
      Generate as few tokens as possible, GPT4 is running a few times to generate a single answer and latency quickly becomes the biggest UX issue.

      We abandoned most of the common thinking around chain of thought reasoning, finding it didn’t help accuracy much whilst increasing response times significantly.

      Full write up to follow in next week or so.

      • reasonabl_human 317 days ago
        Does this mean your queries are all one-shot instead of utilizing techniques like LangChain?
        • louiskw 317 days ago
          Exactly, you can see the prompt in this file [0]. I'm not sure how LangChain arrived at their default agent prompt, but you'll almost certainly want to write your own for performance reasons if you put something into production.

          [0] https://github.com/BloopAI/bloop/blob/main/server/bleep/src/...

          • anotherpaulg 317 days ago
            This is great that you got gpt-4 to explore the codebase using an agent approach. I tried this previously with gpt-3.5-turbo and have been meaning to revisit it since I got gpt-4 access.

            I shared some notes on HN awhile back on a variety of experiments I did with gpt-3.5-turbo.

            https://news.ycombinator.com/item?id=35441666

  • CharlesW 317 days ago
    Does Bloop play in the same space as GitHub Copilot Chat? https://code.visualstudio.com/docs/editor/artificial-intelli...
    • louiskw 317 days ago
      I haven’t tried Copilot Chat but I imagine the key difference is the context. bloop’s tuned to answer questions from anywhere in a repo, copilot chat uses the context of what you’re looking at in-IDE.
      • CharlesW 317 days ago
        > bloop’s tuned to answer questions from anywhere in a repo…

        GitHub Copilot Chat purportedly (I'm also waiting for access) works across files in a workspace, which typically map to a repo.

        • demarq 317 days ago
          In my experience it can only take into account files you have open.

          So if you need it to call a function in another file that hasn’t been called in the current file, you’d have to go open the other file in a new tab then go back to the original and finally get the correct completion

  • imjonse 317 days ago
    Can it work without being granted access to the user's github? Getting authorization to private repos should not be necessary to work with random public repos on github or local code only.
    • louiskw 317 days ago
      GitHub oauth doesn't have a way to limit scope to specific repos, but the token is stored local to your device and the app's logic is that only repos you explicitly select are synced.

      It's also a condition of many LLM providers that application end users are authenticated to prevent abuse, so GitHub auth helps with this.

  • Solvency 317 days ago
    When you say "your code", could I run this on anyone's GitHub project? What if I want to ask questions about how some code in an emulator works, or Doom, or about Vue?
    • IChrisI 317 days ago
      This would be great, actually. I couldn't necessarily feed my company's code to this due to licensing concerns, but I'd love to point this at a Minecraft mod and ask how block X works, or if there's a console command to do Y, or how to construct a weapon with the most damage, etc.
    • louiskw 317 days ago
      Yes, I know that a decent proportion of the community uses bloop to understand open source repos. It can be especially helpful for repos that lack documentation.
  • meghan_rain 317 days ago
    > ... uses GPT4

    when will people learn not to send their entire IP to Microsoft?

    • cddotdotslash 317 days ago
      Microsoft already owns GitHub. This is probably near the bottom of things to be concerned about.
      • ChatGTP 317 days ago
        Are you implying having your code on GitHub means Microsoft accesses to your IP ?
        • ranguna 317 days ago
          Yes, that's how they trained copilot and that's why they are currently on trial about it (but that damage has been done).
          • ChatGTP 317 days ago
            Where can I read about this? They're on trial for training copilot on private repos? This is huge.
            • ranguna 315 days ago
            • BHSPitMonkey 316 days ago
              Your questions in this thread aren't making a lot of sense. Microsoft owns GitHub, so obviously hosting your code there means "sending your intellectual property to Microsoft" in some sense.
              • ChatGTP 316 days ago
                Of course it does, but when I pay fora private repository, I expect it to e private. I've never seen or heard of any evidence that my private code is used to train ChatGPT-4 and or Copilot. So with all due respect, you and the parent aren't making much sense.

                It would be an incredible breach of trust if MS was found to be doing this.

    • louiskw 317 days ago
      We use OpenAI directly, not Azure
      • replwoacause 317 days ago
        But isn’t OpenAI funded by Microsoft in the form of Azure compute?
        • louiskw 317 days ago
          That’s not how funding works, Microsoft can’t read your data.
          • replwoacause 317 days ago
            Yeah ok, I was just pointing out that the data is ultimately being sent to Microsoft owned infra.
      • chaxor 317 days ago
        I think the point being made by using "Microsoft" here was basically a stand in for "other random people".

        I.e anything 'cloud'.

  • glinkot 317 days ago
    Love the idea of this - but it'd be great to have a list of the languages supported (I'd be wanting C# - it's not listed in the 'treesitter bindings' you've linked to, not sure if that means it isn't supported)

    And of course, yes I'd need to be able to provide my gpt4 api key.

    • louiskw 317 days ago
      Even if C# isn't supported for code navigation, you should still be able to ask questions as the LLMs will have seen C#.

      If you want to try it out today (not using your own API key), you could sync something open source.

  • d4rkp4ttern 317 days ago
    I really like how the product looks, congrats! Curious how your "write a dockerfile" example would work -- would it write a dockerfile completely autonomously, or (more likely) involve multiple iterations of the agent + human?
    • louiskw 316 days ago
      It's one shot at the moment, but this isn't by design and may change, as generating code hasn't been our core focus.
  • csfyrakis 317 days ago
    V nice. Can you hook in your own LLM (eg Bloom, t5 etc?)
    • louiskw 317 days ago
      We’re experimenting with this, and the answer is yes and no.

      GPT4 is the only model that can just about run the agent execution, mainly due to context length and quality.

      We use our own model for the embedding based code retrieval, and will be replacing some of the GPT3.5 calls with fine tuned models over the coming months.

      • ranguna 317 days ago
        Would be great if the app allowed connecting with local LLMs like text generation webui. As for quality, it's up to the user to choose their LLM, so I don't see this as relevant.
    • reasonabl_human 317 days ago
      Also interested. Would be really great to run this against a self hosted LLM agent.
  • d4rkp4ttern 317 days ago
    Curious what was your rationale to do this in Rust vs Python? Would be instructive to understand the trade-offs you considered. Thanks
    • louiskw 316 days ago
      I'm sure the other engineers on the project will have their own opinions here, but for me there's the obviously visible parts to the project (the prompt, tools, ...) and the invisible parts (indexing, tokenising/chunking, parallelising, streaming, ...).

      Building agents is an experimental process. You test an approach, maybe it works, and there's not always an obvious reason why certain experiments fail or succeed. We built three prototype agents in a Python and JS, because those languages favour scrappy fast iteration. This helped us quickly nail down our approach to the 'visible' parts.

      Once we nailed down our approach, we rebuilt the agent in Rust because the speed and safety favoured all the 'invisible' parts of the project.

      • d4rkp4ttern 316 days ago
        This is a very interesting perspective, thanks for sharing
  • private1215 315 days ago
    Interesting product! Looking forward to hear about how the agent works. Curious about embedding process.
  • kesor 317 days ago
    Shameless plug - https://github.com/kesor/chatgpt-code-plugin - also does a very good job of telling you anything you'd like about your code. Shared it on ShowHN as well here https://news.ycombinator.com/item?id=36099507
    • louiskw 316 days ago
      No shame at all, very cool! We wanted to go beyond the chat interface, and we have a complicated history management system that ensures answers are reliable in long threads, which wouldn't work with ChatGPT as it removes messages chronologically
  • robinduckett 317 days ago
    Can we use our own GPT4 api access ?
    • louiskw 317 days ago
      Soon, we’re working on open sourcing our GPT proxy. As it’s not possible to self serve sign up to a GPT4 API key we haven’t prioritised
      • robinduckett 317 days ago
        Cool. I downloaded and tried out the app but it just says "Loading code line ranges" on all the returned results.
        • louiskw 317 days ago
          Can you share a screenshot or if the code is open source the repo and query with louis at bloop dot ai

          I’ve seen a similar issue with the syntax highlighter and less common languages. Either way will debug and fix.