Looking at Chat-GPT and its clones, they do support some forms of markup, but they seem to make it all the more apparent that GUI advice is much harder to communicate or utilize than text.. Further many of the problems of using text, for example discoverability, are alleviated by such assistance. So is this going to swing the pendulum back, or is there another direction it will take?
I doubt it. OpenAI is eagerly working on voice and image. They want it to be AGI, basically you'd communicate to it exactly like you would with a human.
I've been mostly just sending it screenshots and photos lately - it's able to handle that faster than text.
Instead of asking it to check your PR, screenshot the PR diff. Instead of giving it logs, screenshot the error message and the log - it's able to understand better with the color coded logs apparently. If you want it to do hackerrank, well, you can't copy paste from there during an interview, but you can take photos with your phone.
Unless I've misunderstood, it is most effective on a picture of text and has to answer with text. It is extremely difficult for it to guide you through some GUI or give you a sequence you may want to correct a little without forcing you to study what exactly it is doing instead of cutting and pasting text into a text UI.
It's hard for me to imagine if multiple AGI wrapped interfaces could use some other input, I.e. emulated remote desktops and screen shares, (and that could be adequately chainable for AGI output to other interface input,) but I feel like adding all of this data is ultimately making it harder to proof read and adapt something AGI proposes and then automate its repeatable usage (like taking scripts or code.)
One of my other top use cases for it is getting it to read docs. It will give me step by step instructions to say, deactivate Facebook or do whatever with AWS. Sometimes I get stuck so I send it a screenshot and it'll tell me that the button is actually a tab, or on the left, or I need to scroll down, etc.
Chained data will likely have a hard time. Most of these wrapper startups will probably have a hard time. I tried to make an AI wrapper startup but I couldn't. It's a rare time where the unicorn with huge teams is actually moving faster than the solo devs. It's almost like they were aided by AI or something.
So for example when it gave me instructions for evolution mail settings it hallucinated a button, for me to know this I had to correctly follow half the instructions and then read them carefully again.. For a text UI one paste them and the interpreter identifies the first incorrect line.
I think it has always been the case that a GUU is bad for communication of training about it, in a world where everyone is like an autodidactic and gets very little help a GUI could win on other stuff related to the user figuring out what to do or recovering a memory of his to do it.
I'm not really focusing here on the GPT interface itself but if it could wrap all sorts of interfaces then text or GUI ones could be replaced by rarely using them directly. But I think such AI interfaces would put themselves at a disadvantage not working with text as a medium between them.