Language models on the command line

(simonwillison.net)

141 points | by rednafi 119 days ago

10 comments

  • simonw 119 days ago
    This was a workshop I gave in my https://llm.datasette.io/ CLI tool.

    What other CLI tools are people using to work with LLMs in the terminal?

    There one comment here about https://github.com/paul-gauthier/aider and Ollama is probably the most widely used CLI tool at the moment: https://github.com/ollama/ollama/blob/main/README.md#quickst...

    • throwup238 119 days ago
      > What other CLI tools are people using to work with LLMs in the terminal?

      I use aichat: https://github.com/sigoden/aichat

      I especially like the terminal integration where I can type a natural language request at the terminal and press Alt+E to have it converted to a command to run.

      • mikeqq2024 116 days ago
        There is similar ShellOracle: https://github.com/djcopley/ShellOracle

        It is used as Bash widget, where you press CTL-F, describe your command, then press Enter, the generated command will be inserted into your shell prompt.

      • kelvie 119 days ago
        I also use aichat. For some reason (well, I know the reason, it's shell quoting) I don't like passing prompts in via command line, so having a simple text-based TUI like a readline prompt is nice.
    • FergusArgyll 119 days ago
      I made my own python wrapper around the Gemini API, I wanted one feature: there is a default system prompt but -s overrides that with a temporary system prompt. -p is for the prompt, if it is left out, the cli is interactive (one long chat until I exit) but with -p I can use regular linux utils

      for example: I wrote a short bash script which uses yt-dlp and ffmpeg to download a song, convert it to my preferred format and then uses gemini to add metadata.

        artist=$(gemini -s "Please respond with the name of the Artist based on 
        the songs title. do not use any other words, just the artist name.
        example:
        'Bruce Springsteen - Old Dan Tucker [S-GHbDFrwlU].opus'
        Bruce Springsteen" -p "$opus_file" | tr -d '\n')
    • nyellin 119 days ago
      Most of the cli tools just wrap an LLM, but don't give it access to the data it needs to be useful. Aider is an exception of course - it gives great results because it feeds the LLM your source files.

      We built http://github.com/robusta-dev/holmesgpt/ to investigate Prometheus/Jira/PagerDuty issues. We're able to get pretty good results (we benchmark extensively) because we use function-calling to give the LLM read acess to relevant data. I think we're the only open source AIOps tool, and the only AIOps tool period that does something more complex than RAG + summarization.

    • pjot 119 days ago
      openai's python client includes a cli actually - im not sure if its really documented anywhere.

        $ openai api chat.completions.create -g 'user' 'say hello' -m 'gpt-3.5-turbo'
      
        $ Hello! How can I assist you today?
    • evmar 119 days ago
      I am using a hacky one I wrote myself.

      I looked at llm but it doesn't appear to have a mechanism for multi-shot prompting, where you provide both prompts and responses within your query. (Ref https://platform.openai.com/docs/guides/prompt-engineering/t... .) Maybe take this as a feature request?

      It feels like the 'template' system in llm might be able to encompass this but the docs don't appear to provide a reference for the yaml format, only examples. I guess that is another feature request, sorry!

      (BTW if you haven't seen https://docs.divio.com/documentation-system/ it really changed how I think about documentation)

      • simonw 119 days ago
        Yeah, I've been thinking a bit about the multi shot thing. I've had great results from Claude by "faking" the previous conversation to include example question/answer pairs.

        With LLM you can do that using the undocumented Python Conversation API, but it's undocumented for a reason (I don't think it's good enough yet). You could also fake a previous conversation through the CLI tool but that is VERY undocumented - you would have to write fake rows into the SQLite database!

        I also want to support the Claude thing where you can prefill the start of the response - amazing for things like forcing HTML by refilling an HTML doctype.

        • evmar 119 days ago
          In case it helps any, here are some more details about what I used it for. (Summary: providing examples of a specific translation task, to make it use a particular API in the output.) https://inuh.net/@evmar/112001414385042731

          And what I expected of llm is for the template file to (optionally) contain an array of prompt/response pairs. You could even imagine the --save flag constructing one from an ongoing conversation of llm -c maybe.

          • simonw 119 days ago
            Yeah, doing this with templates is a great idea.
    • deathmonger5000 119 days ago
      I created a CLI tool called Promptr which is an open source developer tool that allows the user to modify their codebase using plain language. The tool sends the user’s query as well as the relevant source code to an LLM. The changes from the LLM are applied directly to the user’s filesystem eliminating the need for copy pasting. Promptr is implemented in Javascript, and it incorporates liquidjs templating so users can build a library of reusable prompt templates for common tasks and contexts.

      You can find out more here: https://github.com/ferrislucas/promptr

    • bt1a 119 days ago
      (aichat in tangential comment looks much better) Besides aider and ollama, I think shell_gpt https://github.com/TheR1D/shell_gpt is great for quick chats / lookups. Being able to quickly cat files to repl sessions saves a lot of time.

      I need to integrate distil whisper large v3, aider, and shell_gpt to tidy up a lot of my disjointed LLM use. As someone else mentioned, the commits created by aider allow me to "skip" some intermediate steps that would be required when working on coding tasks with other frameworks.

    • snthpy 119 days ago
      I haven't tried it yet but this appeared a few days ago and I'm a big fan of Textual.

      https://github.com/paulrobello/parllama

      • mark_l_watson 119 days ago
        Thanks for sharing that, I also like Textual. I will try parllama and compare it to Gollama.
    • librasteve 119 days ago
      The raku LLM modules are excellent CLI and Jupyter notebook tools. Suggest you start here https://raku.land/?q=LLM%3A%3APrompts and review the comprehensive videos and examples from here https://www.reddit.com/r/rakulang/s/W1UqivfFA9
    • josephrmartinez 119 days ago
      A CLI tool for generating tutorials based on the work you recently committed: https://github.com/josephrmartinez/mktute

      npm i mktute

      You can select between local model (ollama), claude 3.5 sonnet, or gpt-4. I've been surprised to find sonnet much better in performance and price for this task.

    • fragmede 118 days ago
      Open interpreter.

      Open Interpreter lets LLMs run code on your computer to complete tasks. Eg "fix my python" or "make a movie out of the pictures in this directory."

      https://github.com/OpenInterpreter/open-interpreter

    • lynx23 119 days ago
      I am experimenting with my own tool[1] which is built using asnycio and prompt-toolkit. Made it particularily easy to define function_tools via a decorator which can automagically transform async defs into pydantic models.

      [1] https://github.com/mlang/ass

    • mark_l_watson 119 days ago
      My workspace for experimenting with LLMs is chaotic: currently about a 100 Python scripts that each have a single purpose. I sometimes use Ollama or Golamma from the command line. Since I am an old Lisp hacker, I also have a large collection of short Racket and Common Lisp LLM experiments.

      Chaos.

      • nbbaier 114 days ago
        You should use an LLM to classify and consolidate them lol
    • Jaydenaus 119 days ago
      I've been using this one: https://github.com/egoist/shell-ask

      very similar, although the ability to continue a conversation like you can in yours is a killer feature I wish it had.

    • saulpw 119 days ago
      I've appreciated https://github.com/cthulahoops/chatcli which is simple and straightforward.
  • dvt 119 days ago
    Fantastic work here! I'm working on a local tool, affectionately called Descartes, which does something similar—but with a spotlight-like UX for the non-hackers out there.

    I do think that LLMs have the potential to fundamentally change the way we interact with our computers. There's a lot of edge cases (especially when combining it with the inaccurate science of screen readers) but it's pretty mind-blowing when it works. I'm working on a blog post, but here's my little proof of concept working on both Windows in a web browser[1] and MacOS in the Finder [2].

    [1] https://vimeo.com/931907811

    [2] https://dvt.name/wp-content/uploads/2024/04/image-11.png

    • iJohnDoe 119 days ago
      Really cool! Do you plan to release it?
  • bagels 119 days ago
    I wrote one to help with creating command line commands. It just hits openai api with a prompt asking for just a code block to run on bash + whatever is passed in, and then it prints the command out. I wrote it because I can never remember all the weird command args for all the tools.

    $ bashy find large files over 10 gb

    find / -type f -size +10G

    • QuiDortDine 119 days ago
      Do you have a followup command to execute the suggested command?
      • bagels 119 days ago
        No, I want to review or tweak them, just in case it's trying to do something bad. It's usually pretty good though.
        • godelski 119 days ago
          If you use zsh and are willing to source your scripts you can do

            print -z $command
          
          And the command will appear on your cli as if you had written it.

          I don't think you can do this in bash. Interestingly, this is something that seems quite difficult to both google and ask GPT for help. Both get confused and are thinking different questions are being asked. Probably because there are similar more common questions but the subtleties of possible wordings makes it difficult to differentiate.

      • sdf4j 119 days ago
        $ bashy evaluate the previous command output
  • bearjaws 119 days ago
    Aider continues to be the best way to interact with LLMs while coding, and its a command line tool.

    Copilot is pretty good, but the forced change > commit > QA process that Aider forces you through is really powerful.

  • jillesvangurp 119 days ago
    > We have implemented basic RAG—Retrieval Augmented Generation, where search results are used to answer a question—using a terminal script that scrapes search results from Google and pipes them into an LLM.

    I love this. Simple and effective. RAG is just search leveled up with LLMs. Such an obvious thing to do. We know how to do search and can use it to unlock vast amounts of knowledge. Instead of letting LLMs dream up facts by compressing all knowledge into them, a better use of them is letting them summarize and reason about the facts it finds. IMHO the art is actually going to be in letting them come up with the right query as well. Or queries. It could be a lot more exhaustive in its searches than we could be.

    • RansomStark 119 days ago
      While I agree with the sentiment in general, I've came to the conclusion that what I really want is the flexibility of the natural language interface that LLM's provide, and the return of the correct document. No 'reasoning', no summarizing, just better search [0].

      The issue with the current generation of models is that they can't reason, they may do very well at pretending to reason, but they can't [1]. Reasoning requires the ability to identify and reuse patterns, and while there has been some advancement in this area [2] with getting models to learn the underlying pattern and rule, it doesn't generalize. This results in models that will happily tell you that a statement is both true and false, and be unable to identify the logical problem with that.

      Even creating summaries is difficult, and LLM's are more than happy to hallucinate even when summarizing documents, providing incorrect, or entirely made up facts [3]. The general workaround is multiple runs with the same work and averaging the response, but that's a lot of work, and energy.

      [0] https://win-vector.com/2024/05/21/i-want-flexible-queries-no...

      [1] https://medium.com/@konstantine_45825/gpt-4-cant-reason-2eab...

      [2] https://arxiv.org/abs/2405.15071

      [3] https://community.openai.com/t/gpt-4o-hallucinating-at-temp-...

      • mark_l_watson 119 days ago
        I have an experiment that you can reproduce: use a search API (e.g., Brave or Duckduckgo) and for each result, ask a local LLM to rate it as useful or not useful, then I fetch the entire web pages of ‘useful’ results and ask for summaries made considering the original search query. I like to look at these summaries, and I take one more pass asking for the concatenated summaries to be summarized as a group, asking for concise and de-duplicated final summary.

        Anyway, I enjoy playing with this and because I am using my own little Python scripts, I can switch models and hack in it easily.

  • liamYC 119 days ago
    This is awesome, thanks for sharing. I find this kind of tool really useful, Aider in particular. I made my own cli tool for interacting with GPT. It’s really useful with the -c flag for generating code especially bash commands I've forgotten

    https://github.com/ljsimpkin/chat-gpt-cli-tool

  • jgalt212 119 days ago
    Every time I see an LLM demo, I'm blown away. Every time I use one for myself, I feel like a fool.

    I say this because the scraper demo bit looks very neat, but I've been down this path before and I don't want to waste my time getting bad or deceptively incorrect results.

    • ziml77 119 days ago
      When you're looking at a demo, the results just have to look good enough for the demo to be impressive. When you're actually using it, the results actually need to be good.
  • behnamoh 119 days ago
    I wish llm were more stable, but unfortunately things just kept breaking out of the blue without me touching any settings of the program. I often had to reinstall the package. but finally I gave up and implemented my own.
    • simonw 119 days ago
      Which plugins are you using? Did you install via pipx or Homebrew or something else?
      • behnamoh 119 days ago
        I used pipx and used AnthropicAI's plugin to use Claude. After 2 weeks of perfectly working, suddenly I got the "this model is invalid" error.

        PS: I appreciate the work put into llm tho, it's a neat program I used with my Automator scripts to bring LLMs to macOS before Apple Intelligence was announced. I just wish the stability was not a concern.

        • simonw 119 days ago
          That's weird. Were you using llm-claude or llm-claude-3?
        • foobarqux 119 days ago
          The plugins seem to need to be reinstalled after every upgrade
      • jimmcslim 119 days ago
        I followed along with the blog post, but got unstuck with llm-cmd not working on Mac OS. Looks like this PR would fix it https://github.com/simonw/llm-cmd/pull/12
  • qiakai 119 days ago
    > What other CLI tools are people using to work with LLMs in the terminal?

    I personally love using x-cmd. Small size (1.1MB), open source, interactive operation

    [1] https://www.x-cmd.com/

    [2] https://www.x-cmd.com/mod/openai

  • etx 119 days ago
    [flagged]