Great product! I gave it a try and was really impressed!
One thing: Could your team have an explanation somewhere on the home page to help the users understand how their private code is being handled (locally and when the chat request happen)?
The only place I could find that information is on the HN launch thread, would be nice if it's available in the home page!
I'm not currently working for a larger org... but I could certainly say with my last employer having this explicitly documented up front would be crucial for adoption.
This looks super interesting! You mentioned that you'll be sharing some more details on the approach. I'm looking forward to learning more about how you're using the user's query to select relevant code to share with GPT-4.
I have been working on a related problem for my open source GPT coding tool. I am more focused creating a GPT chat experience that can act as a "junior developer" to write and edit code in your git repo. GPT is great at writing fresh, new self-contained code. I have been trying to improve its ability to make changes to a larger, complex repo.
I wrote up some notes on my current approach and some ideas for future work. I'll be curious to see how it compares to your approach, which seems more focused on search and code analysis.
Very cool. I imagine your approach using files/folders + identifiers would work well for small repos, where everything can fit within an 8k prompt. An early prototype of our agent had something similar but with just files/folders no identifiers.
Our working thesis atm is the only approach to context that scales to support massive repos is embedding or text based retrieval.
Code search is going to need to be solved before code editing gets solved. You can't make changes if you can't find where the changes need to be made in a repo. Unless you're willing to make the UX concession that users have to manually select which files need to be edited.
This looks really interesting, I wonder if we can get to the point of having this all be local to your machine. The idea to use embeddings to search for relevant code snippets is maybe obvious to some, but just now getting into this stuff, just blew my mind!
I find that the hardest problems to find are interfaces between projects like a bad frontend call into a backend or backend to backend. I wonder if this could index separate projects & draw links between them
Semantic relationships between backend/frontend or microservices are super interesting.
We’re not far off, for example you if you index bloop itself with bloop and ask “What message does the backend send to indicate the frontend should close the eventsource?” bloop will return a decent answer which takes into account the relevant frontend and backend code.
Generate as few tokens as possible, GPT4 is running a few times to generate a single answer and latency quickly becomes the biggest UX issue.
We abandoned most of the common thinking around chain of thought reasoning, finding it didn’t help accuracy much whilst increasing response times significantly.
Exactly, you can see the prompt in this file [0]. I'm not sure how LangChain arrived at their default agent prompt, but you'll almost certainly want to write your own for performance reasons if you put something into production.
This is great that you got gpt-4 to explore the codebase using an agent approach. I tried this previously with gpt-3.5-turbo and have been meaning to revisit it since I got gpt-4 access.
I shared some notes on HN awhile back on a variety of experiments I did with gpt-3.5-turbo.
I haven’t tried Copilot Chat but I imagine the key difference is the context. bloop’s tuned to answer questions from anywhere in a repo, copilot chat uses the context of what you’re looking at in-IDE.
In my experience it can only take into account files you have open.
So if you need it to call a function in another file that hasn’t been called in the current file, you’d have to go open the other file in a new tab then go back to the original and finally get the correct completion
Can it work without being granted access to the user's github? Getting authorization to private repos should not be necessary to work with random public repos on github or local code only.
GitHub oauth doesn't have a way to limit scope to specific repos, but the token is stored local to your device and the app's logic is that only repos you explicitly select are synced.
It's also a condition of many LLM providers that application end users are authenticated to prevent abuse, so GitHub auth helps with this.
When you say "your code", could I run this on anyone's GitHub project? What if I want to ask questions about how some code in an emulator works, or Doom, or about Vue?
This would be great, actually. I couldn't necessarily feed my company's code to this due to licensing concerns, but I'd love to point this at a Minecraft mod and ask how block X works, or if there's a console command to do Y, or how to construct a weapon with the most damage, etc.
Yes, I know that a decent proportion of the community uses bloop to understand open source repos. It can be especially helpful for repos that lack documentation.
Your questions in this thread aren't making a lot of sense. Microsoft owns GitHub, so obviously hosting your code there means "sending your intellectual property to Microsoft" in some sense.
Of course it does, but when I pay fora private repository, I expect it to e private. I've never seen or heard of any evidence that my private code is used to train ChatGPT-4 and or Copilot. So with all due respect, you and the parent aren't making much sense.
It would be an incredible breach of trust if MS was found to be doing this.
Love the idea of this - but it'd be great to have a list of the languages supported (I'd be wanting C# - it's not listed in the 'treesitter bindings' you've linked to, not sure if that means it isn't supported)
And of course, yes I'd need to be able to provide my gpt4 api key.
I really like how the product looks, congrats! Curious how your "write a dockerfile" example would work -- would it write a dockerfile completely autonomously, or (more likely) involve multiple iterations of the agent + human?
We’re experimenting with this, and the answer is yes and no.
GPT4 is the only model that can just about run the agent execution, mainly due to context length and quality.
We use our own model for the embedding based code retrieval, and will be replacing some of the GPT3.5 calls with fine tuned models over the coming months.
Would be great if the app allowed connecting with local LLMs like text generation webui. As for quality, it's up to the user to choose their LLM, so I don't see this as relevant.
I'm sure the other engineers on the project will have their own opinions here, but for me there's the obviously visible parts to the project (the prompt, tools, ...) and the invisible parts (indexing, tokenising/chunking, parallelising, streaming, ...).
Building agents is an experimental process. You test an approach, maybe it works, and there's not always an obvious reason why certain experiments fail or succeed. We built three prototype agents in a Python and JS, because those languages favour scrappy fast iteration. This helped us quickly nail down our approach to the 'visible' parts.
Once we nailed down our approach, we rebuilt the agent in Rust because the speed and safety favoured all the 'invisible' parts of the project.
No shame at all, very cool! We wanted to go beyond the chat interface, and we have a complicated history management system that ensures answers are reliable in long threads, which wouldn't work with ChatGPT as it removes messages chronologically
One thing: Could your team have an explanation somewhere on the home page to help the users understand how their private code is being handled (locally and when the chat request happen)?
The only place I could find that information is on the HN launch thread, would be nice if it's available in the home page!
Glad to hear you found it helpful, I’m around if you have any issues.
I have been working on a related problem for my open source GPT coding tool. I am more focused creating a GPT chat experience that can act as a "junior developer" to write and edit code in your git repo. GPT is great at writing fresh, new self-contained code. I have been trying to improve its ability to make changes to a larger, complex repo.
I wrote up some notes on my current approach and some ideas for future work. I'll be curious to see how it compares to your approach, which seems more focused on search and code analysis.
https://aider.chat/docs/ctags.html
Our working thesis atm is the only approach to context that scales to support massive repos is embedding or text based retrieval.
Code search is going to need to be solved before code editing gets solved. You can't make changes if you can't find where the changes need to be made in a repo. Unless you're willing to make the UX concession that users have to manually select which files need to be edited.
1. For people with only access to gpt3.5 API, does this still work?
2. Also does it work for repositories not hosted in github (ex: gitlab, bitbucket)?
2. It works with any git repos, you just have to clone them on your local machine first
I find that the hardest problems to find are interfaces between projects like a bad frontend call into a backend or backend to backend. I wonder if this could index separate projects & draw links between them
We’re not far off, for example you if you index bloop itself with bloop and ask “What message does the backend send to indicate the frontend should close the eventsource?” bloop will return a decent answer which takes into account the relevant frontend and backend code.
This is an active area of improvement.
We abandoned most of the common thinking around chain of thought reasoning, finding it didn’t help accuracy much whilst increasing response times significantly.
Full write up to follow in next week or so.
[0] https://github.com/BloopAI/bloop/blob/main/server/bleep/src/...
I shared some notes on HN awhile back on a variety of experiments I did with gpt-3.5-turbo.
https://news.ycombinator.com/item?id=35441666
GitHub Copilot Chat purportedly (I'm also waiting for access) works across files in a workspace, which typically map to a repo.
So if you need it to call a function in another file that hasn’t been called in the current file, you’d have to go open the other file in a new tab then go back to the original and finally get the correct completion
It's also a condition of many LLM providers that application end users are authenticated to prevent abuse, so GitHub auth helps with this.
when will people learn not to send their entire IP to Microsoft?
https://www.theverge.com/2022/11/8/23446821/microsoft-openai...
It would be an incredible breach of trust if MS was found to be doing this.
I.e anything 'cloud'.
And of course, yes I'd need to be able to provide my gpt4 api key.
If you want to try it out today (not using your own API key), you could sync something open source.
GPT4 is the only model that can just about run the agent execution, mainly due to context length and quality.
We use our own model for the embedding based code retrieval, and will be replacing some of the GPT3.5 calls with fine tuned models over the coming months.
Building agents is an experimental process. You test an approach, maybe it works, and there's not always an obvious reason why certain experiments fail or succeed. We built three prototype agents in a Python and JS, because those languages favour scrappy fast iteration. This helped us quickly nail down our approach to the 'visible' parts.
Once we nailed down our approach, we rebuilt the agent in Rust because the speed and safety favoured all the 'invisible' parts of the project.
I’ve seen a similar issue with the syntax highlighter and less common languages. Either way will debug and fix.