9 comments

  • causal 15 days ago
    This kind of reflects the fact that a lot of working with LLMs is just organizing text, and prompts can become a real engineering problem when you are orchestrating pipelines of dozens or more files with completions at various points with context windows of 100K tokens or more.

    I've not found a satisfying framework yet, generally find raw Python best. But I still spend too much time on boilerplate and tweaking formatting or samplers and chunking for context windows.

    If anyone knows of a better tool for abstracting that away (LangChain is not it IMO) please let me know.

  • _andrei_ 15 days ago
    Pretty nice, made a CLI app for this as well, seems like a common need: https://github.com/3rd/promptpack

    But sending whole files isn't always optimal, I'm thinking there has to be a better way, like picking workspace symbols and pulling in only the code they depend on from other files. Something something LSP/tree-sitter-based.

    • anotherpaulg 15 days ago
      This is what aider does, using tree sitter to extract the AST from each source file. It uses the ASTs to build a call graph. And then does a graph optimization to identify the most relevant parts of the code base, given the current state of the LLM chat.

      There’s more details in this article:

      https://aider.chat/2023/10/22/repomap.html

      • _andrei_ 14 days ago
        Ah wasn't aware, nice work!
      • arthurcolle 15 days ago
        aider is super limited. solid approach but needs a lot of work to make it usable
    • bredren 15 days ago
      Cool, thanks for sharing.

      That's a great point.

      A tree-based file path browser with ability to select all or individual functions or classes would be cool.

      Jetbrains IDEs have a good interface for symbols via the refactoring UI. Maybe I'll look there for some inspiration.

  • Jimmc414 15 days ago
    This is nice. I created something similar, https://github.com/jimmc414/1filellm

    It converts papers, repositories, PRs, YT transcripts and web docs into one text file in the clipboard for llm ingestion

    • Sakos 14 days ago
      This stuff sounds cool, but doesn't it quickly run into token/context limits on the models?
      • Jimmc414 14 days ago
        Not anymore in the subscription LLM offerings. Claude seems to allow 70k tokens or more in their paid UI, ChatGPT seems to be about half of that while custom GPTs allow well over 100k.
  • smcleod 14 days ago
    I use code2prompt (https://github.com/mufeedvh/code2prompt) with the following zsh wrapper:

      function code2prompt() {
    
        # wrap the code2prompt command in a function that sets a number of default excludes
        # https://github.com/mufeedvh/code2prompt/
    
        local arguments excludeFiles excludeFolders templatesFolder excludeExtensions
        
        templatesFolder="${HOME}/git/code2prompt/templates"
        excludeFiles=".editorconfig,.eslintignore,.eslintrc,tsconfig.json,.gitignore,.npmrc,LICENSE,esbuild.config.mjs,manifest.json,package-lock.json,\
        version-bump.mjs,versions.json,yarn.lock,CONTRIBUTING.md,CHANGELOG.md,SECURITY.md,.nvmrc,.env,.env.production,.prettierrc,.prettierignore,.stylelintrc,\
        CODEOWNERS,commitlint.config.js,renovate.json,pre-commit-config.yaml,.vimrc,poetry.lock,changelog.md,contributing.md,.pretterignore,.prettierrc.json,\
        .prettierrc.yml,.prettierrc.js,.eslintrc.js,.eslintrc.json,.eslintrc.yml,.eslintrc.yaml,.stylelintrc.js,.stylelintrc.json,.stylelintrc.yml,.stylelintrc.yaml"
        excludeFolders="screenshots,dist,node_modules,.git,.github,.vscode,build,coverage,tmp,out,temp,logs"
        excludeExtensions="png,jpg,jpeg,gif,svg,mp4,webm,avi,mp3,wav,flac,zip,tar,gz,bz2,7z,iso,bin,exe,app,dmg,deb,rpm,apk,fig,xd,blend,fbx,obj,tmp,swp,\
        lock,DS_Store,sqlite,log,sqlite3,dll,woff,woff2,ttf,eot,otf,ico,icns,csv,doc,docx,ppt,pptx,xls,xlsx,pdf,cmd,bat,dat,baseline,ps1,bin,exe,app,tmp,diff,bmp,ico"
    
        echo "---"
        echo "Available templates:"
        ls -1 "$templatesFolder"
        echo "---"
    
        echo "Excluding files: $excludeFiles"
        echo "Excluding folders: $excludeFolders"
        echo "Run with -nn to disable the default excludes"
    
        # array of build arguments
        arguments=("--tokens")
    
        # if -t and a template name is provided, append the template flag with the full path to the template to the arguments array
        if [[ $1 == "-t" ]]; then
          arguments+=("--template" "$templatesFolder/$2")
          shift 2
        fi
    
        if [[ $1 == "-nn" ]]; then
          command code2prompt "${arguments[@]}" "${@:2}" # remove the -nn flag
        else
          command code2prompt "${arguments[@]}" --exclude-files "$excludeFiles" --exclude-folders "$excludeFolders" --exclude "$excludeExtensions" "${*}"
        fi
      }
  • mutant 13 days ago
  • acbart 15 days ago
    "Isn't this just a GUI for the cat command" "Oh. That's the joke."
  • reidbarber 15 days ago
    Nice! I made something similar but for the browser recently: https://files2prompt.com

    I think there some CLI tools out there as well.

    • levysoft 14 days ago
      Thank you for sharing, I found it really useful and well done! You did a really great job!
      • reidbarber 11 days ago
        Thanks, I appreciate it! I hope to keep adding features. If there are missing features that you'd use, feel free to leave them in a reply.
  • Birdguy05761 15 days ago
    This is sweet!! Organizing source docs manually is so tedious
  • bredren 15 days ago
    If you need to feed multiple files to chatgpt or another LLM, this makes it way easier than manually copy and pasting.

    This app shows you a file modal. Use Shift or Option keys to select multiple text files across one or more directories.

    All of the selected files will be concatenated for easy select all / paste into your LLM conversation.

    Output format of selected files is:

      ### `[filepath]`
      [file contents]
      ### `[filepath]`
      ... and so on.
    
      - Output is in a text field for easy copy-pasta.
      - File path starts at the common parent of all selected files