Language models on the command line

(simonwillison.net)

141 points | by rednafi 370 days ago

10 comments

simonw 370 days ago
This was a workshop I gave in my https://llm.datasette.io/ CLI tool.
What other CLI tools are people using to work with LLMs in the terminal?
There one comment here about https://github.com/paul-gauthier/aider and Ollama is probably the most widely used CLI tool at the moment: https://github.com/ollama/ollama/blob/main/README.md#quickst...
[-]
- throwup238 370 days ago
  > What other CLI tools are people using to work with LLMs in the terminal?
  I use aichat: https://github.com/sigoden/aichat
  I especially like the terminal integration where I can type a natural language request at the terminal and press Alt+E to have it converted to a command to run.
  [-]
  - mikeqq2024 367 days ago
    There is similar ShellOracle: https://github.com/djcopley/ShellOracle
    It is used as Bash widget, where you press CTL-F, describe your command, then press Enter, the generated command will be inserted into your shell prompt.
  - kelvie 370 days ago
    I also use aichat. For some reason (well, I know the reason, it's shell quoting) I don't like passing prompts in via command line, so having a simple text-based TUI like a readline prompt is nice.
- FergusArgyll 370 days ago
  I made my own python wrapper around the Gemini API, I wanted one feature: there is a default system prompt but -s overrides that with a temporary system prompt. -p is for the prompt, if it is left out, the cli is interactive (one long chat until I exit) but with -p I can use regular linux utils
  for example: I wrote a short bash script which uses yt-dlp and ffmpeg to download a song, convert it to my preferred format and then uses gemini to add metadata.
```
  artist=$(gemini -s "Please respond with the name of the Artist based on 
  the songs title. do not use any other words, just the artist name.
  example:
  'Bruce Springsteen - Old Dan Tucker [S-GHbDFrwlU].opus'
  Bruce Springsteen" -p "$opus_file" | tr -d '\n')
```
- nyellin 370 days ago
  Most of the cli tools just wrap an LLM, but don't give it access to the data it needs to be useful. Aider is an exception of course - it gives great results because it feeds the LLM your source files.
  We built http://github.com/robusta-dev/holmesgpt/ to investigate Prometheus/Jira/PagerDuty issues. We're able to get pretty good results (we benchmark extensively) because we use function-calling to give the LLM read acess to relevant data. I think we're the only open source AIOps tool, and the only AIOps tool period that does something more complex than RAG + summarization.
- pjot 370 days ago
  openai's python client includes a cli actually - im not sure if its really documented anywhere.
```
  $ openai api chat.completions.create -g 'user' 'say hello' -m 'gpt-3.5-turbo'

  $ Hello! How can I assist you today?
```
- evmar 370 days ago
  I am using a hacky one I wrote myself.
  I looked at llm but it doesn't appear to have a mechanism for multi-shot prompting, where you provide both prompts and responses within your query. (Ref https://platform.openai.com/docs/guides/prompt-engineering/t... .) Maybe take this as a feature request?
  It feels like the 'template' system in llm might be able to encompass this but the docs don't appear to provide a reference for the yaml format, only examples. I guess that is another feature request, sorry!
  (BTW if you haven't seen https://docs.divio.com/documentation-system/ it really changed how I think about documentation)
  [-]
  - simonw 370 days ago
    Yeah, I've been thinking a bit about the multi shot thing. I've had great results from Claude by "faking" the previous conversation to include example question/answer pairs.
    With LLM you can do that using the undocumented Python Conversation API, but it's undocumented for a reason (I don't think it's good enough yet). You could also fake a previous conversation through the CLI tool but that is VERY undocumented - you would have to write fake rows into the SQLite database!
    I also want to support the Claude thing where you can prefill the start of the response - amazing for things like forcing HTML by refilling an HTML doctype.
    [-]
    - evmar 370 days ago
      In case it helps any, here are some more details about what I used it for. (Summary: providing examples of a specific translation task, to make it use a particular API in the output.) https://inuh.net/@evmar/112001414385042731
      And what I expected of llm is for the template file to (optionally) contain an array of prompt/response pairs. You could even imagine the --save flag constructing one from an ongoing conversation of llm -c maybe.
      [-]
      - simonw 370 days ago
        Yeah, doing this with templates is a great idea.
- deathmonger5000 370 days ago
  I created a CLI tool called Promptr which is an open source developer tool that allows the user to modify their codebase using plain language. The tool sends the user’s query as well as the relevant source code to an LLM. The changes from the LLM are applied directly to the user’s filesystem eliminating the need for copy pasting. Promptr is implemented in Javascript, and it incorporates liquidjs templating so users can build a library of reusable prompt templates for common tasks and contexts.
  You can find out more here: https://github.com/ferrislucas/promptr
- bt1a 370 days ago
  (aichat in tangential comment looks much better) Besides aider and ollama, I think shell_gpt https://github.com/TheR1D/shell_gpt is great for quick chats / lookups. Being able to quickly cat files to repl sessions saves a lot of time.
  I need to integrate distil whisper large v3, aider, and shell_gpt to tidy up a lot of my disjointed LLM use. As someone else mentioned, the commits created by aider allow me to "skip" some intermediate steps that would be required when working on coding tasks with other frameworks.
- snthpy 370 days ago
  I haven't tried it yet but this appeared a few days ago and I'm a big fan of Textual.
  https://github.com/paulrobello/parllama
  [-]
  - mark_l_watson 370 days ago
    Thanks for sharing that, I also like Textual. I will try parllama and compare it to Gollama.
- librasteve 370 days ago
  The raku LLM modules are excellent CLI and Jupyter notebook tools. Suggest you start here https://raku.land/?q=LLM%3A%3APrompts and review the comprehensive videos and examples from here https://www.reddit.com/r/rakulang/s/W1UqivfFA9
- josephrmartinez 370 days ago
  A CLI tool for generating tutorials based on the work you recently committed: https://github.com/josephrmartinez/mktute
  npm i mktute
  You can select between local model (ollama), claude 3.5 sonnet, or gpt-4. I've been surprised to find sonnet much better in performance and price for this task.
- fragmede 370 days ago
  Open interpreter.
  Open Interpreter lets LLMs run code on your computer to complete tasks. Eg "fix my python" or "make a movie out of the pictures in this directory."
  https://github.com/OpenInterpreter/open-interpreter
- lynx23 370 days ago
  I am experimenting with my own tool[1] which is built using asnycio and prompt-toolkit. Made it particularily easy to define function_tools via a decorator which can automagically transform async defs into pydantic models.
  [1] https://github.com/mlang/ass
- mark_l_watson 370 days ago
  My workspace for experimenting with LLMs is chaotic: currently about a 100 Python scripts that each have a single purpose. I sometimes use Ollama or Golamma from the command line. Since I am an old Lisp hacker, I also have a large collection of short Racket and Common Lisp LLM experiments.
  Chaos.
  [-]
  - nbbaier 365 days ago
    You should use an LLM to classify and consolidate them lol
- Jaydenaus 370 days ago
  I've been using this one: https://github.com/egoist/shell-ask
  very similar, although the ability to continue a conversation like you can in yours is a killer feature I wish it had.
- saulpw 370 days ago
  I've appreciated https://github.com/cthulahoops/chatcli which is simple and straightforward.
dvt 370 days ago
Fantastic work here! I'm working on a local tool, affectionately called Descartes, which does something similar—but with a spotlight-like UX for the non-hackers out there.
I do think that LLMs have the potential to fundamentally change the way we interact with our computers. There's a lot of edge cases (especially when combining it with the inaccurate science of screen readers) but it's pretty mind-blowing when it works. I'm working on a blog post, but here's my little proof of concept working on both Windows in a web browser[1] and MacOS in the Finder [2].
[1] https://vimeo.com/931907811
[2] https://dvt.name/wp-content/uploads/2024/04/image-11.png
[-]
- iJohnDoe 370 days ago
  Really cool! Do you plan to release it?
bagels 370 days ago
I wrote one to help with creating command line commands. It just hits openai api with a prompt asking for just a code block to run on bash + whatever is passed in, and then it prints the command out. I wrote it because I can never remember all the weird command args for all the tools.
$ bashy find large files over 10 gb
find / -type f -size +10G
[-]
- QuiDortDine 370 days ago
  Do you have a followup command to execute the suggested command?
  [-]
  - bagels 370 days ago
    No, I want to review or tweak them, just in case it's trying to do something bad. It's usually pretty good though.
    [-]
    - godelski 370 days ago
      If you use zsh and are willing to source your scripts you can do
      print -z $command
      And the command will appear on your cli as if you had written it.
      I don't think you can do this in bash. Interestingly, this is something that seems quite difficult to both google and ask GPT for help. Both get confused and are thinking different questions are being asked. Probably because there are similar more common questions but the subtleties of possible wordings makes it difficult to differentiate.
  - sdf4j 370 days ago
    $ bashy evaluate the previous command output
bearjaws 370 days ago
Aider continues to be the best way to interact with LLMs while coding, and its a command line tool.
Copilot is pretty good, but the forced change > commit > QA process that Aider forces you through is really powerful.
jillesvangurp 370 days ago
> We have implemented basic RAG—Retrieval Augmented Generation, where search results are used to answer a question—using a terminal script that scrapes search results from Google and pipes them into an LLM.
I love this. Simple and effective. RAG is just search leveled up with LLMs. Such an obvious thing to do. We know how to do search and can use it to unlock vast amounts of knowledge. Instead of letting LLMs dream up facts by compressing all knowledge into them, a better use of them is letting them summarize and reason about the facts it finds. IMHO the art is actually going to be in letting them come up with the right query as well. Or queries. It could be a lot more exhaustive in its searches than we could be.
[-]
- RansomStark 370 days ago
  While I agree with the sentiment in general, I've came to the conclusion that what I really want is the flexibility of the natural language interface that LLM's provide, and the return of the correct document. No 'reasoning', no summarizing, just better search [0].
  The issue with the current generation of models is that they can't reason, they may do very well at pretending to reason, but they can't [1]. Reasoning requires the ability to identify and reuse patterns, and while there has been some advancement in this area [2] with getting models to learn the underlying pattern and rule, it doesn't generalize. This results in models that will happily tell you that a statement is both true and false, and be unable to identify the logical problem with that.
  Even creating summaries is difficult, and LLM's are more than happy to hallucinate even when summarizing documents, providing incorrect, or entirely made up facts [3]. The general workaround is multiple runs with the same work and averaging the response, but that's a lot of work, and energy.
  [0] https://win-vector.com/2024/05/21/i-want-flexible-queries-no...
  [1] https://medium.com/@konstantine_45825/gpt-4-cant-reason-2eab...
  [2] https://arxiv.org/abs/2405.15071
  [3] https://community.openai.com/t/gpt-4o-hallucinating-at-temp-...
  [-]
  - mark_l_watson 370 days ago
    I have an experiment that you can reproduce: use a search API (e.g., Brave or Duckduckgo) and for each result, ask a local LLM to rate it as useful or not useful, then I fetch the entire web pages of ‘useful’ results and ask for summaries made considering the original search query. I like to look at these summaries, and I take one more pass asking for the concatenated summaries to be summarized as a group, asking for concise and de-duplicated final summary.
    Anyway, I enjoy playing with this and because I am using my own little Python scripts, I can switch models and hack in it easily.
liamYC 370 days ago
This is awesome, thanks for sharing. I find this kind of tool really useful, Aider in particular. I made my own cli tool for interacting with GPT. It’s really useful with the -c flag for generating code especially bash commands I've forgotten
https://github.com/ljsimpkin/chat-gpt-cli-tool
jgalt212 370 days ago
Every time I see an LLM demo, I'm blown away. Every time I use one for myself, I feel like a fool.
I say this because the scraper demo bit looks very neat, but I've been down this path before and I don't want to waste my time getting bad or deceptively incorrect results.
[-]
- ziml77 370 days ago
  When you're looking at a demo, the results just have to look good enough for the demo to be impressive. When you're actually using it, the results actually need to be good.
behnamoh 370 days ago
I wish llm were more stable, but unfortunately things just kept breaking out of the blue without me touching any settings of the program. I often had to reinstall the package. but finally I gave up and implemented my own.
[-]
- simonw 370 days ago
  Which plugins are you using? Did you install via pipx or Homebrew or something else?
  [-]
  - behnamoh 370 days ago
    I used pipx and used AnthropicAI's plugin to use Claude. After 2 weeks of perfectly working, suddenly I got the "this model is invalid" error.
    PS: I appreciate the work put into llm tho, it's a neat program I used with my Automator scripts to bring LLMs to macOS before Apple Intelligence was announced. I just wish the stability was not a concern.
    [-]
    - simonw 370 days ago
      That's weird. Were you using llm-claude or llm-claude-3?
    - foobarqux 370 days ago
      The plugins seem to need to be reinstalled after every upgrade
  - jimmcslim 370 days ago
    I followed along with the blog post, but got unstuck with llm-cmd not working on Mac OS. Looks like this PR would fix it https://github.com/simonw/llm-cmd/pull/12
qiakai 370 days ago
> What other CLI tools are people using to work with LLMs in the terminal?
I personally love using x-cmd. Small size (1.1MB), open source, interactive operation
[1] https://www.x-cmd.com/
[2] https://www.x-cmd.com/mod/openai
etx 370 days ago
[flagged]