"Extensions" and integration into the rest of the Google ecosystem could be how Bard wins at the end of the day. There are many tasks where I'd prefer an integration with my email/docs over a slightly smarter LLM. Unlike ChatGPT plugins, Google has the luxury of finetuning its model for each of their integrations.
The new feature for enriching outputs with citations from Google Search is also pretty cool.
Yes, exactly. Integration is where the real power of these agents can live.
I really want an agent that can help me with pretty simple tasks
- Hey agent, remember this link and that it is about hyper fast, solar powered, vine ripened, retroencabulators.
- Hey agent, remember that me and Bob Retal talked about stories JIRA-42 and JIRA-72 and we agreed to take actions XYZ
- Hey agent, schedule a zoom meeting with Joe in the afternoon next Tuesday.
- Hey agent, what did I discuss with Bob last week?
Something with retrieval and functional capability could easily end up being easier to use than the actual UIs that are capable of doing these kinds of things now.
No doubt about it. Google isn't competing directly with ChatGPT, but is betting that having a small fine-tuned model "close to the data" will dramatically cost-outperform a huge general-purpose LLM. Less resource-intensive interference, less prompt engineering (less noise).
For someone already invested in the Google suite of products (gmail, docs, etc), this sounds pretty useful.
Also, this part seems especially interesting:
> Starting today with responses in English, you can use Bard’s “Google it” button to more easily double-check its answers. When you click on the “G” icon, Bard will read the response and evaluate whether there is content across the web to substantiate it. When a statement can be evaluated, you can click the highlighted phrases and learn more about supporting or contradicting information found by Search.
The biggest problem with all LLMs at the moment is the frequency at which they are wrong (at least when they are used like an internet search to lookup factual info). Any LLM that can improve this (or as in Bard's case, make it easier to detect wrong info) is likely to gain traction.
This feedback loop, when you extend LLM to the horizon, is my primary point against this approach. When 90% of the new training data is from content (a worse version of) it previously generated you get a negative feedback loop to zero quality.
Bard is especially bad right now, at least compared to gpt4. Once they move to Gemini it will be interesting to compare, but until then things like "when does the new Staind album come out" have to be answered the old way with Google search and reading.
I'm sure others will feel differently (and I'm very eager to hear from others with different views), but for me there's not that much difference in usefulness between a model that's wrong 5% of the time and one that's wrong 25% of the time. Both models require manual validation 100% of the time.
Do you feel confident in your ability to detect when the LLM is wrong? For me personally, I don't have the confidence to do this , which is why I feel like I need to verify everything, even in the 5% case.
For me it comes down to "how important is it to be right"? For many of the queries, it's not that important and if I lose the 5% gamble, it's annoying but ultimately inconsequencial. If it's an important thing to be right about, then I will verify it from both.
I don't know how else to ask this: how are you so okay with disregarding accuracy?
What questions are you asking where you don't care if the answer is wrong? I guess I just fundamentally don't understand what the point is. Why not just bookmark the "random article" link on Wikipedia if it doesn't matter anyway?
I encounter situations where I used to rely on Stack Overflow, which carries a similar (if not higher) likelihood of being incorrect, often due to outdated information.
For instance, I was recently inquiring about a specific task with CMake and consulted ChatGPT. Initially, the response was inaccurate, but it was obviously so when it didn’t compile. Upon reprompting, I received the correct answer.
I ask a fair number of code questions where bugs and stuff will be caught either immediately or shortly after first use. If 5% of those are wrong, totally fine with me.
I also sometimes ask questions like "what is the tallest mountain in the US" or "what is the hottest desert on Earth" or similar. If I really need to know that the answer is correct, at a minimum it gives me a name to search for to verify height in feet compared to others, etc.
You have to be okay with being wrong regardless. The little snippets after a Google search are sometimes wrong. The blogs or links can be misinformed (For example, most seem to be wrong about Staind's new album release date).
The idea of perfection is silly because it doesn't exist LLM or not. You're not going to get it so it's a matter how often it's right.
Yeah, hard to blame Bard for getting that wrong when there's tons of webpages that seem semi-legit that have September 15th. Searching confessions of the fallen "September 15" 2023 yields hundreds of thousands of results which at least for a cached version has that date.
Comparatively, ChatGPT says "I'm sorry, but I do not have access to real-time information, and my knowledge only goes up until September 2021. To find out the release date of a new Staind album, I recommend checking the official Staind website, social media profiles, or reputable music news sources for the most up-to-date information."
So which is more useful, one that doesn't even know there is a new album coming out, or one that knows what was its release date as of just a couple of months ago?
> So which is more useful, one that doesn't even know there is a new album coming out, or one that knows what was its release date as of just a couple of months ago?
To semi-misquote Lewis Carroll: Which is better, a stopped clock or a clock which loses a minute a day? Carroll posits the former, as it is precisely correct twice a day. The trick, of course, is knowing for sure when those two times per day will be.
Did it say September 15 or 18? (both answers I've gotten from it over the past month or two). If so, that's wrong. It's September 22. As sibling pointed out though, there are apparently web pages that have it wrong so it's probably not Bard's fault. I also wouldn't be shocked if the date changed and was originally 9/15
>Citing production delays, Staind have now announced that their new album “Confessions Of The Fallen” will arrive a week later than intended. That full-length outing will now be available on September 22nd.
Now this is actually useful. There's a lot of good information in my Gmail but searching it is such a pain that I hardly ever do.
I just asked Bard for the date of an upcoming event and it did the search for me and found the right answer and summarized it with extra detail and references. This is the only reason so far that I'd go to Bard over ChatGPT.
It did treat the @Gmail part as part of the query words though, which is weird. I think it won't be ready for mass consumption until it can decide for itself when to search Gmail or Drive with no weird keywords necessary.
I just used it for 15 minutes and I like the direction Google is going with it. Once you turn Bard extensions on, you use the “@“ character to get a pop up list of services like GMail, Google Drive, and many others; choose one, then ask your question.
First steps, and I look forward to seeing future improvements. I wonder how they will monetize this? I was just using it with my free GMail account.
Both Microsoft, with Office 365, and Google have the customers and web properties that can make good use of new types of LLM applications.
I still think chat is not all natural for humans to interact with computers, mainly because most people are not actually good at phrasing their needs and even if they are, typing it for the LLM to understand takes so much time.
Thanks to QWERTY keyboards, our keyboards are not efficient for typing either.
Yes, I agree it should be faster, 100ms between question and concise answers. With standard phrasing of needs and wants like Cucumber or Non-Violent Communication and a Dvorak keyboard layout or a shorthand input system (yash or bref) you can go a long way. For standard stuff you could always drop to short ruby like syntax
The existing paradigm is temporary. AI will improve to infer more meaning from what users want, increase their ability to perform it, and chat systems will eventually switch to optional voice input. You tell the system what to do, and it'll perform whatever steps it needs to. I can't imagine anything more natural than that, for the everyday user.
I'm sure things will get more esoteric, for the experienced computer user.
Looking at the difference from GPT-3.5 to GPT-4, you can see these features are already appearing. GPT-4 can do more, understand more types of data, infer more context, and craft more advanced responses. With the right prompting, it can even ask clarifying questions to handle ambiguity. Most of the time when people talk about the limitations of AI, they're talking about limitations that are _very_ close to being improved upon. Context windows are a weakness, but they're growing, and memory features are being developed. Intelligence is improving. Understood data types are expanding. My guess is that GPT-5 (and equivalents) will understand voice natively, instead of needing a speech to text model in the middle.
There are absolutely serious limitations to existing AI, but the criticisms mentioned here aren't where we're stuck.
Yes, being able to voice chat with AI is interesting, and we can already do that (e.g., using voice-to-text).
> AI will improve to infer more meaning from what users want
This is true. It already knows what it should do when I give it a piece of code and say "" (the bug emoji) or "spot the bug". If you give it the error code, it already knows it has to find a solution.
But for non-programming tasks, it'll be a challenge to infer what the user wants because language is vague and people are used to speaking ambiguously and clarifying only when the other person doesn't understand.
This morning I asked it if it could query YouTube and I got short description of the YouTube API. I don't know whether the API details have been public.
I then used it to search for videos,
> There are several YouTube videos with a title that begins, "Gregory Blackstock, The Incomplete Historical World," List those videos with the YouTube URLs.
In this particular instance, the titles aren't unique, but the videos are different. The result appeared to be correct. The list was formatted to I can paste it into an email, something that doesn't happen when I search YouTube itself. Nice!
My next prompt was
> Make a list of the top 5 most viewed YouTube videos about moths.
Strangely, the result was 3 groups of 5 videos, with some duplicate results.
> Why did you list 15 videos in your reply when I asked for 5?
Reply: "I apologize for the confusion. I am still under development, and I am always learning.
I listed 15 videos because I found it difficult to narrow down the list to just 5. There are many beautiful and fascinating moth videos on YouTube, and I wanted to share as many as I could.
I will try to do better next time. If you have any feedback on how I can improve my responses, please let me know."
I'm a bit concerned about one of the privacy terms. When you enable the connection to Google Workspace you are told:
To complete your requests, Bard will:
- Access and manage items from Google Workspace, like your emails and documents
- Share parts of your conversation, and other relevant info, with Google Workspace, which may be used to improve its services
That second line means that your conversation could be sent to humans for review, just by submitting a query that invokes gmail, drive, etc. This is in direct contrast to the terms of Bard itself which won't store, use for training, or allow human review of conversations unless you explicitly submit feedback about said conversations.
This amounts to "If you turn on Google Workspace any of your conversations could be seen by humans."
Just tried it for a few minutes with all extensions enabled and it failed at almost every task I proposed.
How many emails are in my gmail? (Completely incapable of answering this, didn't even try, just listed my last ~5 emails)
Based on my gmail account, how many flights have I booked in the last two years? (Also completely incapable, and didn't try, and again listed those same last ~5 emails)
When is the cheapest one way flight from NYC to Bali over the next six months? (Was for some weird reason only capable of returning round trip flights, but it did at least give me a plausible date and list of flights)
I'm probably using it wrong, but not a super "wow" first impression.
I think you need to give tasks that enable it to lookup and process specific documents. I am not sure they designed it for data mining tasks. And it’s probably not as smart as
GPT4 which might break it down into steps like “search for possible flight confirmations” and “validate results”.
This seems incredibly dangerous. You're one typo away from having the entirety of your private life exposed by bard.
They will use all of your private data for advertising, and a human will review the data fed into bard. In other words, all of your private information is now reviewed by a human as they see fit. Yuck.
>Please don’t enter confidential information in your Bard conversations or any data you wouldn’t want a reviewer to see or Google to use to improve our products, services, and machine-learning technologies.
Maybe put that one front and center on your bard page, not buried on a completely different website....
>I’m as big of an AI skeptic as anyone but even I believe you can set up an integration of personal information that doesn’t leak publicly.
In my experience, there's a message with every new Bard conversation that links to the page you linked to and says "Human reviewers may process your Bard conversations for quality purposes. Don't enter sensitive info."
I have the integration with google search enabled and now I only use GPT very rarely. Having it at the top of the SERP with zero effort is much more convenient. The fact that it gets many things wrong becomes irrelevant when it takes no effort on my part. Even when it is totally wrong, its feature to intersperse the output with links to its sources is quite nice.
Bard's integration with Gmail is extremely basic -- it's just using regular Gmail search to include a few relevant emails in its prompt and then trying to answer your question. There's no AI on the email side.
For summary extractions I’ve found Bard to be miles better than ChatGPT of late. Waiting for some of my dependent services to add api support for Bard… I’m guessing this will go back and forth a few times over the coming years.
Makes me wonder, are they using everyone's data to train and personalise Bard? It must be incredibly tempting to use Chat AIs to persuade you to buy products advertised by Google in a way that you will perfectly respond to.