we tried to build something similar lately for outbound calls (for simple reminders to partners) and faced massive issues using gpt-4o-realtime-audio. Noise detection, turn detection, random telephony issues (we were using Twilio too), prompt not holding together, and more.
We dropped the project because it would have resulted in a terrible experience for the person on the other side of the phone. Building these things is non trivial.
The plan would have been to A/B test and see what the response would have been (watching NPS and business metrics uplift). Human handoff was always the plan in case things got too tricky for the LLM to handle.
I see some hostility here towards this project and while I share many concerns, it is very naive to think that these services won’t be massively leveraged going forward. An AI agent can handle things as well as humans (not in our case but there are good services out there, i.e. Parloa) and the key elements are the same as all the other agentic based workflows:
- narrow use cases
- human in the loop ready to pick up/steer/correct
we will see a lot more of
this and as LLM capabilities improve, it will only get better - it is inevitable at this point and might (_might_) result in a better experience for customers in some cases.
Nevertheless I also see the possibility that we will go full circle and we will always reach for a human, maybe showing up in person in a physical office to make sure cases or requests are handled well… or not :-)
I've read this comment twice and I genuinely can't understand it.
Uh, so your own attempt at a similar project didn't work and was a terrible experience and the fundamentals of the system are specific and still require babysitting. But it's inevitable (???) that it'll get better... and this improvement only MIGHT make things better for people, only some of the time?
I'm not alone in being unimpressed by this, right? Nothing about what was written here sounds... good? Even the most optimistic part is "well, maybe it might be good, sometimes". Like, this sucks. This is a bad system that doesn't work and makes things worse.
what I mean is: building these systems is nontrivial, but if done well it can help. Imagine non being in an endless queue on a phone call when having to do a simple task through a customer center call, or having a phone reminder with more information and less noise than from a written notification. The fact that I failed at it (for lack of experience and resources) does not mean it should just be shrugged off as useless or impractical. Some companies offer this service and it works just fine for narrow use cases.
I did some freelancing around LLMs for startups and I can't tell you the amount of times I've had to reject "great ideas" that were just about spamming people. Almost had a big red sign "no outbound!"
Hi I am Sagar, We just open-sourced a complete framework to build an AI-powered telephony agent that can handle both inbound and outbound calls—using Python, SIP, and cloud LLMs like OpenAI or Gemini.
You can use it to create smart appointment bots, voice feedback collectors, or even enterprise IVR systems. It’s modular (plug in your SIP provider or AI model), production-ready, and extensible for real-time workflows.
Features include:
SIP & VoIP call handling (Twilio, Plivo, etc)
LLM-integrated AI agent (customizable prompt & tools)
I'm still getting to grips with all this AI stuff but I do know SIP n RTP rather well. Your project looks ideal for me to play with.
My initial project will be to replace my aging parent's and my home FreePBX IVR with something a bit more useful. At the moment it requests you press 1 to make the phones ring and that drops pretty much all unwanted calls but that will soon be automatically defeated.
I do also have some rather more commercial ideas but baby steps. Mind you a RPi based drop in box with a decent auto attendant for say £100 hardware, open source software and some fettling time seems a pretty good opportunity. You could go for somewhat cheaper hardware. The UK is being migrated away from copper A and B wires to what is called SOGEA and/or FTTP. Basically its going to be all VoIP for telephony from pretty much now on. The good old days of an analogue handset that is powered by the line will mostly go away within a few years. On the bright side our signalling was bloody weird and SIP is nearly eternal!
I'll take an alternative view of someone else's comment. No offense, but how much shittier will my customer service experience with my power company or ISP be because of this?
I'm someone who builds these types of voice agent systems for businesses. And the short answer is, I think especially for the bigger corporations, customer service probably will get shittier.
It doesn't have to be inevitable...but the track record of big corps and their view of customer-facing departments as being loss-centers instead of being directly and indirectly connected to retaining and bringing in revenue doesn't leave me optimistic.
For small-to-medium businesses...it just depends. SMBs are typically more sensitive to the opinions and sentiments of their customers (and also bad press) and also typically have workers with relatively more institutional and tacit knowledge compared to big corporations. So I would like to believe more of them will be much more methodical and intentional about their approaches (especially with cautionary tales like Klarna and DuoLingo). There's no reason for any business to ruin their customer service with AI other than impulsive and/or poorly executed decisions being made out of some mix of fear, hype, greed, or willful ignorance. Entirely avoidable.
The bot will have more info and doesn't degrade every few years (as nobody is there to be replaced and lose the tribal knowledge that makes a business run).
It also.. is just a dumb bot and _only_ knows what it is told.
as somebody who works in this area: depends on how much effort they will be willing to invest. it might be better than current service or it might be worse.
big question will be availability of humans as fall back that actually have some skills
I still can't wrap my head around how making functional online portals and well written documentation for customers is too hard, but laying off entire call centers and replacing them with a chat bot that relies on checks notes a functional set of APIs for tool use and well written documentation for a KB, is easier. It looks like all the same effort plus more!
You're not wrong. Businesses that do that are going to be in for a rude surprise. Time will tell if they care about pissing off their customers more than they care about getting rid of workers.
you overestimate customers. most of them not going to bother to read documentation . used to work closely with big telecoms. amount of support calls that are resolved by "make sure that connector is tight" is crazy high (they invented f-type connectors that work even when they are not tightened properly). "reboot it", solves most of remaining issues.
imho, human populated callcenters may actually become a good business model in a few years, after everybody will fire their personal to replace it with AI, and will discover that in some cases you need humans
No offense, but how about evaluating the project on its own merits?
In my experience AI based CS agents are not deployed to actually provide customer service. That does not differ too much from the "old school" phone call centre with scripts etc.
Anyway, you can run a phone exchange on a RPi. My favorite anti marketing thing is a simple IVR: "If you think we would like to speak to press 1, if you are making an unsolicited sales call, then hang up". However, that will soon be defeated (but not yet, I'm happy to report). Eventually, I will need the machine to take calls for me. For that I will probably need a system like OPs.
When it comes to telephony, the biggest issue is who pays for the call. For me it is the caller. For some (US int al) it might be the recipient. The only "cost" to me is my time to take a call.
Now, back to call centres. You want problem resolution and they want call stats. There is a bit of a disconnect. Power companies and other utilities all claim to be competitive but nearly everywhere that is bollocks.
In the UK we have privatised water companies, covering regions but how on earth can I, within the purvey of Wessex water, get my water from say Scotland. Ironically enough SSSE (one of the S is Scottish) is the local area electrical provider. I can choose my electic and gas supplier but not my water supplier. How on earth is that a free market? Well it might be but not for me, the consumer. I understand that ISP provision in the US is pretty similar to our water company situation: you have a choice of one.
So, I think, that the problem here is not the AI or the medium or whatever but something far deeper and far more entrenched.
Your shitty customer experience is because the status quo is say 50-100 years old and nothing to do with some nerdy new technology.
A huge number of these jobs will go yes. A call centre supervisor may be safe taking escalation calls that need a human and so on but the masses will be replaced by this kind of tech.
However what will actually happen is society will use these people to brick lay for houses, care for the elderly or something else. That's honestly a good thing for society as we have massive shortages there, and not a bad thing for the individuals as a whole.
> However what will actually happen is society will use these people to brick lay for houses, care for the elderly or something else. That's honestly a good thing for society as we have massive shortages there, and not a bad thing for the individuals as a whole.
Labor "shortages" for those jobs exist because they are not financially attractive. Why is it a "good thing" to eliminate more attractive roles? How does this materially reduce the cost of living, or increase for the roles you point to?
Let's say answering a phone is 11/hour and laying bricks is 11.20/hour. No big shock that people will take the phone job, but if you remove that option more people will flow into the laying bricks job.
Also, let's not forget an underlying pillar of society; real-estate must never decrease in value. That doesn't really fit with the theory that we're going to build a lot of real-estate.
One of the pillars of our society is that real-estate value must never go down, and so I'm skeptical that a lot more people are going to start building real-estate.
The question is totally fair, but it's unreasonable (imho) to expect owners of this project to have to answer. We are looking at < 500 lines of python, mostly just gluing together SIP and agents.
My reaction was slightly different: how many companies selling this (meager) service at a high premium will go out of business now that it's free/open?
I'm not convinced that most companies are going to choose shifting all those people to "higher value work", when the alternative is firing at least some of them them to improve short-term shareholder profits.
What you say is totally going to do opposite! In developing countries like India here people don't have too many skills are going to face the heat anyways
it turns out many professions are essentially a long loop of repetitive tasks. Think telemarketing, or phone support, for instance. What kind of "higher-value work" would a phone support agent do?
> The goal isn’t to replace people [...] It’s about shifting roles, not eliminating them and doing it responsibly.
With what knowledge you have of the entire history of capitalism ever, do you, genuinely and earnestly, believe this is what's going to happen, and there isn't, perhaps, a different outcome that is more likely?
We just hook up generators to bikes so that the former phone workers can now power the AI thats replaced them. This will eventually be a cheaper alternative to the current power grid as ai electricity consumption increases.
We dropped the project because it would have resulted in a terrible experience for the person on the other side of the phone. Building these things is non trivial.
The plan would have been to A/B test and see what the response would have been (watching NPS and business metrics uplift). Human handoff was always the plan in case things got too tricky for the LLM to handle.
I see some hostility here towards this project and while I share many concerns, it is very naive to think that these services won’t be massively leveraged going forward. An AI agent can handle things as well as humans (not in our case but there are good services out there, i.e. Parloa) and the key elements are the same as all the other agentic based workflows:
- narrow use cases
- human in the loop ready to pick up/steer/correct
we will see a lot more of this and as LLM capabilities improve, it will only get better - it is inevitable at this point and might (_might_) result in a better experience for customers in some cases.
Nevertheless I also see the possibility that we will go full circle and we will always reach for a human, maybe showing up in person in a physical office to make sure cases or requests are handled well… or not :-)
Uh, so your own attempt at a similar project didn't work and was a terrible experience and the fundamentals of the system are specific and still require babysitting. But it's inevitable (???) that it'll get better... and this improvement only MIGHT make things better for people, only some of the time?
I'm not alone in being unimpressed by this, right? Nothing about what was written here sounds... good? Even the most optimistic part is "well, maybe it might be good, sometimes". Like, this sucks. This is a bad system that doesn't work and makes things worse.
You can use it to create smart appointment bots, voice feedback collectors, or even enterprise IVR systems. It’s modular (plug in your SIP provider or AI model), production-ready, and extensible for real-time workflows.
Features include:
SIP & VoIP call handling (Twilio, Plivo, etc)
LLM-integrated AI agent (customizable prompt & tools)
FastAPI-based server for routing and control
Plugins for STT, TTS, sentiment analysis
Support for Agent2Agent and MCP protocols
GitHub Repo:https://github.com/videosdk-live/agents Full Blog: https://www.videosdk.live/blog/ai-telephony-agent-inbound-ou...
Would love feedback from anyone working with telephony, LLMs, or real-time automation!
My initial project will be to replace my aging parent's and my home FreePBX IVR with something a bit more useful. At the moment it requests you press 1 to make the phones ring and that drops pretty much all unwanted calls but that will soon be automatically defeated.
I do also have some rather more commercial ideas but baby steps. Mind you a RPi based drop in box with a decent auto attendant for say £100 hardware, open source software and some fettling time seems a pretty good opportunity. You could go for somewhat cheaper hardware. The UK is being migrated away from copper A and B wires to what is called SOGEA and/or FTTP. Basically its going to be all VoIP for telephony from pretty much now on. The good old days of an analogue handset that is powered by the line will mostly go away within a few years. On the bright side our signalling was bloody weird and SIP is nearly eternal!
It doesn't have to be inevitable...but the track record of big corps and their view of customer-facing departments as being loss-centers instead of being directly and indirectly connected to retaining and bringing in revenue doesn't leave me optimistic.
For small-to-medium businesses...it just depends. SMBs are typically more sensitive to the opinions and sentiments of their customers (and also bad press) and also typically have workers with relatively more institutional and tacit knowledge compared to big corporations. So I would like to believe more of them will be much more methodical and intentional about their approaches (especially with cautionary tales like Klarna and DuoLingo). There's no reason for any business to ruin their customer service with AI other than impulsive and/or poorly executed decisions being made out of some mix of fear, hype, greed, or willful ignorance. Entirely avoidable.
The bot will have more info and doesn't degrade every few years (as nobody is there to be replaced and lose the tribal knowledge that makes a business run).
It also.. is just a dumb bot and _only_ knows what it is told.
big question will be availability of humans as fall back that actually have some skills
imho, human populated callcenters may actually become a good business model in a few years, after everybody will fire their personal to replace it with AI, and will discover that in some cases you need humans
In my experience AI based CS agents are not deployed to actually provide customer service. That does not differ too much from the "old school" phone call centre with scripts etc.
Anyway, you can run a phone exchange on a RPi. My favorite anti marketing thing is a simple IVR: "If you think we would like to speak to press 1, if you are making an unsolicited sales call, then hang up". However, that will soon be defeated (but not yet, I'm happy to report). Eventually, I will need the machine to take calls for me. For that I will probably need a system like OPs.
When it comes to telephony, the biggest issue is who pays for the call. For me it is the caller. For some (US int al) it might be the recipient. The only "cost" to me is my time to take a call.
Now, back to call centres. You want problem resolution and they want call stats. There is a bit of a disconnect. Power companies and other utilities all claim to be competitive but nearly everywhere that is bollocks.
In the UK we have privatised water companies, covering regions but how on earth can I, within the purvey of Wessex water, get my water from say Scotland. Ironically enough SSSE (one of the S is Scottish) is the local area electrical provider. I can choose my electic and gas supplier but not my water supplier. How on earth is that a free market? Well it might be but not for me, the consumer. I understand that ISP provision in the US is pretty similar to our water company situation: you have a choice of one.
So, I think, that the problem here is not the AI or the medium or whatever but something far deeper and far more entrenched.
Your shitty customer experience is because the status quo is say 50-100 years old and nothing to do with some nerdy new technology.
However what will actually happen is society will use these people to brick lay for houses, care for the elderly or something else. That's honestly a good thing for society as we have massive shortages there, and not a bad thing for the individuals as a whole.
Labor "shortages" for those jobs exist because they are not financially attractive. Why is it a "good thing" to eliminate more attractive roles? How does this materially reduce the cost of living, or increase for the roles you point to?
A byproduct is the drop in wages in the bricklayer job, as the call center workers that were fired are now fighting for the bricklayer jobs.
My reaction was slightly different: how many companies selling this (meager) service at a high premium will go out of business now that it's free/open?
We’ve also built in Human-in-the-Loop support so a person can step in anytime the AI falls short. More on that here: https://docs.videosdk.live/ai_agents/human-in-the-loop
It’s about shifting roles, not eliminating them and doing it responsibly.
With what knowledge you have of the entire history of capitalism ever, do you, genuinely and earnestly, believe this is what's going to happen, and there isn't, perhaps, a different outcome that is more likely?