This is a brilliant and useful application of LLM technology, I'm impressed.
One question- On the backend, is it downloading each video CC (closed-caption) transcript and feeding that into a tuned prompt? What happens for videos where this is missing? Asking because I've noticed CC is occasionally unavailable for some YouTube videos.
If you cared to have a fallback, a potentially interesting experiment / solution for such cases is to download the video, extract the audio to a WAV file, then through the audio through Whisper [1] to generate the transcript. Using CPUa, it will still be incredibly intensive and slow, generally not much faster than real-time (e.g. a 5 minute clip will take on the order of ~5 minutes to complete transcription). However, with Whisper running on a fancy GPU it is insanely faster, between 100-200x faster, meaning even for long videos, generating the transcripts will complete in only a few seconds.
p.s. Is there any chance you'd open source your code? Or do you plan to turn this into a business? The code itself is exactly a huge moat, and it'd be cool to see how you did this. Cheers.
p.p.s. stepify.tech app is currently crashing out to a heroku error page when I try to submit a YT link.
Thank you!
I'm getting the transcript through an API and feeding it to the GPT. For now, the fallback function for no captions is just to make something out of the description of the video.
I really appreciate the suggestion, i'll experiment around using Whisper. Regarding open source or business. I don't really know about that yet. Maybe, i'll lean towards the business side to cover the costs and see where this goes.
And sorry for the downtime! API credits ran out. It should be fixed by now
Eek, so many typos in my comment - but the most egregious was where I meant to convey the code itself is not a huge moat. Even still, no worries if you don't want to give it away, I totally understand.
Definitly try out whisper after splitting out the audio as a fallback, and don't forget their are other models like WhisperFast that might be slightly less accurate but less resource intesnive, and since your not publishing the captions themselves you don't need it to literally get every word perfect.
Here an example of implementation you may find interesting (that also includes snapshots, and links back to original video) - https://github.com/Yannael/video2blogpost
As someone who can’t stand the modern trend away from text and towards video, I can’t praise this idea enough. The number of circumstances where a video is better than text with some clarifying pictures is quite small
100% agree. Video can be helpful for supplementary illustration, to show exactly how to orient parts in an assembly, etc. but at the cost of (often) sitting through a lot of rambling monologue that is not.
I haven't tried this yet but it would be helpful if each step included a link to the spot in the video where that step is shown, so that in case you need it it's easy to find.
Yeah. The only way to find some written instructions these days is searching for reddit specifically. Which I'm not a big fan of, either.
I've had multiple instances where I had a simple issue with zero decent Google results, and a YouTube result with literally the exact question I had in the title. I had to sift through 12 minutes of "like and subscribe", a dude clicking around in various screens mumbling some stuff... I would have been very happy with a simple blog post
Super interesting. I recently went down the DIY rabbit hole for solar, electricity, etc. I tested out https://stepify.tech/video/O8eVxRVwlnw and looks decent:
1. It took about ~45 seconds for the page to load once I put the URL in. You should have a loader on a page showing that the website is "doing something" while the AI transcribes.
2. It would be great to sync the chapters in the YT video with the guide details.
3. Even more advanced would be the specific items like "Drill holes, insert expansion bolts, and secure the inverter to the wall using nuts and washers." showed a timestamp and thumbnail with a link to the video part.
4. It would be great to have a checklist functionality (maybe this is the "pro version"). I often do something, get halfway and then need to scrub the YT video to find the specific place where he talks about the action item.
1) Speed : the site is often showing heroku errors. Seems like you are running the entire processing in the request-response cycle. If not already done, please try to use a queueing system to perform async processing - and then let the user know when their video is ready to view as steps (probably via email or browser notifications). This will stop your site from crashing frequently and you'll be able to scale to many users very quickly.
2) Please add link-backs to the specific time in the video from where the step is shown.
Is there a way to request items that were submit get removed? Can you provide a way to contact you such as an email address? There wasn't one posted on your site.
It's just a suggestion, I mean right now anyone can submit anyone's videos without their consent or ownership verification. How do you plan to handle that? I'm sure there will be folks out there who wouldn't feel comfortable that a site will be scraping their video content attempting to generate a large network of pages on 1 domain with loads of SEO terms. It provides a conflict of interest with the original creators. This conflict of interest is around SEO competition, reducing views from original creators and then there's the other can of worms of any future plans to monetize your site through subscriptions, paid features or ads where you'd be profiting from the content of others without their consent.
I posted one of my videos just to see what would happen and then it created a permanently hosted page on your domain with an AI generated recap of the video. I didn't realize that was going to happen. There was no warning, label of how it works, TOS that I agreed to or options available to make it private and there's no option to delete it. I put in the URL, hit submit and that was it.
It's nothing personal and I hope you don't see this as a deterrent. I'm all for building cool things and generally openly share almost everything for free (I've been blogging and making videos for ~9 years and don't have a single ad on anything I ever posted) but the idea of having inaccurate AI generated content does rub me the wrong way.
> The guides are generated from pure transcript so you don't have to worry about it being AI.
You mentioned it's generated from pure transcripts but most of the phrases used aren't what was mentioned in the video. It looks like a paraphrased version of it but it's also missing all of the details that would allow someone to follow along.
Directly under the video on the page it says "This response is AI generated". One one hand you say it's not AI generated but then on the other hand it is.
Public doesn't mean it's available for someone else to use however they see fit.
That's why we have licenses and YouTube's default license ensures creators retain ownership of what they upload and are protected by copyright. The license allows YouTube to broadcast the content.
> Why do some of you think it is not okay to put YouTube embeds on a website?
YouTube embeds are a different story, that is an official YouTube feature which allows folks to embed a YouTube video on a 3rd party website. I have no problem with that. YouTube even allows creators to enable or disable that on a per video basis. I keep it enabled because it's useful and promotes sharing of the original content as it was delivered.
I don't like making assumptions but look at how responsive the original poster of this thread was to most comments. They replied to a ton of people, but not this comment. They've also made an explicit decision not to include any way to remedy this issue or even contact them through their website. I'll let you draw your own conclusions from that.
I wouldn't have even minded as much if the generated text was good but in this case it was wildly inaccurate and missed all of the details that would have let you follow along without the video. The site's official tagline is "Get a step-by-step tutorial of any video to follow along". If someone sees the text generated they might infer a video was of poor quality because this site claims it can produce a step by step tutorial of ANY video to follow along. That sheds negative light on folks who created the original video.
> Only YouTube and the video owner will earn revenue from ads on embedded videos. The owner of the site where the video is embedded will not earn a share.
Furthermore, the YouTube creator can choose to not let their video be embedded if they wanted that.
Do you have a problem with every news website that has a video at the top, then an article describing what happens in the video? How would that violate the licensing? It's unrelated to licensing - they're using the official YouTube embed. YouTube manages the copyright of the embedded content and can even control whether or not the video can be viewed in your country, etc. based on such restrictions.
> look at how responsive the original poster of this thread was to most comments but they ignore this request
Irrelevant, but I think because it's obvious you're misunderstanding copyright, or because you wrote such a big paragraph with many separate points being made that it's a lot of work to reply to. The copyright in his footer is for his IP, it of course would not apply to the content inside a YouTube embed. And it's not IP theft to summarize a video in what is essentially a blog post.
It's really interesting how some folks don't see this as an additional way to drive traffic to the video, when so many channels have a website of their videos.
This type of tool could help create much more meaningful blog or website type content to build a mailing list around the community.
Who owns the mailing list? Who owns the blog? This random guy who built this tool, or the actual creator who made the content?
The problem with this thought process is that the creator has been taken out of the equation without actually talking to them about it, and when that question gets raised, there just seems to be lots of pushback, likely in part because it touches on the primary complication of LLMs (that a whole bunch of copyrighted content is getting siphoned without considering the people who made that information).
In this particular case, this is literally taking the information from that content and presenting it in a format the creator did not agree to, lowering the potential value of the video to end users. It is much closer to violating the creator’s copyright than generative AI often is.
Instead of pushing back, we need to bring the creators into the discussion to ensure this is something they’re OK with.
The point of a YouTube embed is to share it on other websites, so this is a ridiculous argument. The YouTube creator of course gets the revenue from embed shares, but why do you think they should own every website it's shared on? That makes no sense nor would it ever happen.
The creator isn't taken out of the equation at all - their content is being promoted and they're getting ad revenue for views there (as agreed upon in the YouTube terms).
And for the YouTube creator who decided to give their video to YouTube, but doesn't want it shared on third-party sites, YouTube lets them disable embeds.
Putting a YouTube embed, summarizing a YouTube video - neither are "violating the creator's copyright" which they already gave to YouTube anyway.
Stop bringing up YouTube embeds, it is literally 0% of my point. I literally do not care.
The problem is 100% the use of LLMs to pull the content in an unsanctioned manner. Considering that YouTube has sanctioned methods to share this information, in the form of transcripts, this directly competes with something that the creators are already making in a not-as-good form.
Additionally, it makes the video less desirable. The embed does not matter; creators allow them for a reason. It is the conversion from video to text in a way that the creators did not ask for and likely do not want.
> The problem is 100% the use of LLMs to pull the content in an unsanctioned manner
If the public facing web wasn't crawlable Google and many other things wouldn't be possible. What are you saying that YouTube should not be viewable unless someone is properly authed? Take it up with YouTube - they could require logins to view videos if they wanted but it would be a worse product.
When a user posts something on YouTube and checks "Allow embeds" they have not only given their video to YouTube but are totally cool with people sharing it around. Who are you - their lawyer? Even most YouTube creators do not share your opinion as I can see most are allowing embeds. The point of mentioning embeds is it's both:
1) Credit to the creator, including the revenue share which you were incorrect when you said it lowers their revenues - they still get the ad revenue from an embed
2) An indicator that the creator wants their content to be shared, since they have the option to disable them if they want and choose not to
The adjacent point you seem to be making is that nobody should be allowed to crawl information and present it as their own. But that's what a lot of the internet is. It's what a search engine is, it's the source of most online encyclopedias, news sites, and so-on.
It just doesn't logically follow that if somebody uploads a video to YouTube, that nobody is allowed to summarize it in text form. That's a very normal thing that is done online.
This isn’t Google though. This is some guy creating a website that will allow him to trend for your term on Google, without the support of the ads that would be financially supporting your content, or any engagement opportunities.
By using LLM he has created a method to literally outclass every video on YouTube. People will not watch the video, because what’s the point, all the information is already there, taken so fully as to likely not pass fair use claims.
Please stop throwing whataboutisms into this argument. You keep doing it, and they aren’t sticking. This is about this specific site, which takes the full content that people make (not a summary, enough so you can follow along without watching the video, which would go beyond fair use), strips out the ads, and puts them into a format the creator didn’t intend. It is not about embeds, it is not about Google, or aggregators, or scrapers. It is not promotional, and nor does it have any side benefits. It is just taking content that isn’t theirs and packaging it as a modest benefit to the end user.
This site is extremely unethical as is, with limited benefit to the people who actually made the content. No sidestepping or whataboutisms will change that.
I'll steelman the opposite side; YouTube creators have no rights.
When you upload a video to YouTube, you are licensing Google to redistribute a copy of your content at their whims. The uploader agrees to give Google "a worldwide, non-exclusive, royalty-free, sublicensable and transferable license to use, reproduce, distribute, prepare derivative works of, display, and perform" your video.
The video you upload to YouTube isn't yours anymore. You can pretend it is, and play around with it like a little paper doll for all it's worth. You don't own it anymore though, and your right to judge where it is and belongs is stripped the moment you click "Upload".
To the extent taking notes on a video without permission is a thing?
The outcome is not competing with the source video. Anyone's who's interested in the topic would likely still choose which video they want to delve into further.
I’m referring to channels which have their own website which lists videos, and maybe summaries like this for the videos.
The email list in that case would belong to whomever owned the site.
Still, a lot of creating is based on other creation.
It’s a reality that this form of a video is akin to taking written notes for it. Maybe what’s upsetting is it’s quite decent at it, perhaps reflecting the work of the developer to get it there. It’s not an easy feat.
Taking notes can’t be illegal and a copyright violation. Neither does expecting people to watch an entire video to recall something make a lot of sense.
If this product provided embeddable summaries for video creators to put into their descriptions it could be pretty useful too to some.
It feels like there is some kind of attachment to video because of how laborious. Making videos easier to create is an area I’m engaged in. It is about to become much easier, and not from the LLM or generative video side.
The process of editing videos is in the stone ages and quite laborious. Lots of opportunity to improve there, and once they become easier those who were able to create before as an advantage will have to make sure their videos are even better.
In the comment you're replying to (mine), I literally wrote:
> YouTube embeds are a different story, that is an official YouTube feature which allows folks to embed a YouTube video on a 3rd party website. I have no problem with that. YouTube even allows creators to enable or disable that on a per video basis. I keep it enabled because it's useful and promotes sharing of the original content as it was delivered.
In your comment you've written things like "Furthermore, the YouTube creator can choose to not let their video be embedded if they wanted that." which implies you haven't read the comment I wrote because I mentioned that. I'm also not in disagreement that embedding is generally useful and I support it fully.
That makes me think you might have replied to the wrong person?
> Public doesn't mean it's available for someone else to use however they see fit.
You're implying the embed is being used unethically or in an illegal way that violates copyright - it's not.
> That's why we have licenses and YouTube's default license ensures creators retain ownership of what they upload
Not true: YouTube manages copyright themselves and can even control which countries the video can be viewed in etc. And the rights are given up by the creator when they agree to YouTube's terms which grants YouTube a:
“worldwide, non-exclusive, royalty-free, sublicenseable and transferable license to use, reproduce, distribute, prepare derivative works of, display, and perform the Content in connection with the Service”
which of course includes their site embed.
> I have no problem with that. YouTube even allows creators to enable or disable that on a per video basis. I keep it enabled because it's useful and promotes sharing of the original content as it was delivered
If your problem is the fact that the video is summarized in a blog post, tutorial, article, etc. then I still disagree, and maintain that it doesn't violate any copyright - the purpose of the YouTube embed is to display the content on another website.
This one really made me laugh. Good thing the website takes in only transcript to produce the response. This video had none, otherwise it would've been a problem hah.
For the "Paid" or "Pro" version, let me have a browser extension that replaces ALL OF YOUTUBE with your text based breakdowns.
// I'm not really kidding! Because boy do I hate 15 minute videos with the one CLI command you need buried like a needle in a haystack. Seeing the nonsense distilled into a handful of straightforward steps is so refreshing. Awesome work!
You’d have to be lucky to get the correct and complete CLI command from the transcript though, unless this is also doing OCR, which I don’t think it is.
Sorry for that, I'm looking into it. The problem is for videos that have no transcript. Maybe it's because i'm feeding it the description of the video for now. I'll find some workaround for this. Thanks!
> The problem is for videos that have no transcript.
Whisper or other models can help with that too, but remember to preprocess to cut silence. The dataset tends to include ads in the captions, which results in hallucinated in from silence.
You could also add a transcript-evaluation step which checks whether this actually looks like a step-by-step video, but I'd consider skipping it for cost and efficiency. Trying to be helpful by evaluating whether the video is instructions or not is added complexity where bugs and strange behavior can creep in.
Consider passing the video and transcripts through SponsorBlock (removing sponsor, self-promo, interaction remember, intro and outro segments from the videos) before stepifying them, that might help
Feels like you might have to explicitly ask it not to put "drop a comment below" or "like and subscribe" into the instructions (or strip it from transcripts), since most YouTubers who take YouTube seriously are going to ask...
I made something a little similar, but just as a little cli script that I run locally for myself. You can input a url for a YouTube video, podcast link or local audio/video file. It transcribes it with whisper and outputs the full transcript in one text file and I use another model to summarize it into a bullet list in a separate file.
I so appreciate these open source/access models allowing us to build these kinds of tools without having to pay and send our data to openai.
Whisper is a different company than Youtube (Google). Youtube's transcription existed before Whisper too so I'd suspect Google has their own for some time.
Whisper's is supposed to be better in some cases, but Google's probably works very well at scale.
This seems like something people on HN have asked for before. I clicked on one Recent video about how to create a simple Flask app in 5 minutes and the instructions seemed good on a cursory view.
I tried entering a new video but I got a Heroku application error. Maybe it's a limits thing.
When I look at the Recent videos, a lot of them are not for instructions/tutorials. Perhaps people do not understand the purpose of this project. Maybe they are just testing it out with non-tutorial content.
Maybe you could add representative videos towards the top so that people would get a better sense of the use of this project?
I don't know why this isn't more popular here. It's a good idea. (Maybe it has already been implemented elsewhere?) Reading is much faster than watching a video for many instruction-based tasks. Good luck!
Yeah, you just said what was on my mind since I launched it. The code I wrote is for tutorial videos. Non-tutorial video responses are just gibberish. The representative videos on the top is a great idea. I'll look into it.
Can you tell me more about the video you entered? Did it have a transcript? How many hours long was it?
If you continue to this road, you should plan to fund the creators that this is siphoning from, or allow them some form of consent to agree to this.
What you are doing is, whether you’ve considered this or not, at risk of harming people who are building around video because it is financially viable. People produce these guides as videos because that’s how they can make money from them, whereas it is much more difficult to do so on a website.
You need to consider the implications of what you’ve built.
Hm, is this the right take? The YouTube player is embedded on the page, giving the creator YouTube views and more exposure. And I think when a person uploads to YouTube the idea is their video will be out there - including in embeds on 3rd party sites.
I just wouldn't use the word "siphoning" here. There are countless blog posts, news articles, how-to guides, etc. that will embed a video like this yet also provide supporting text for readers. I think it's a pretty normal way of sharing content.
I for one am not a person who learns by watching videos, step-by-step guides work better for me. The idea that all those video tutorials could be made available as text-based guides sounds actually very useful - and I would still be very aware of who originated that content as their video is embedded right there.
It would actually be great if when I search for a tutorial and the most relevant result is a video, if my browser could summarize that video the way search engines summarize results at the top or in the side bar.
It’s literally describing the entire video in a way that is intended to replace the purpose of the video, and only displays the video minimally in the context of the website.
It pulls out the information without adding anything of value, while making it impossible for the creator to make money from it.
This happens with text content, too. Ask publishers about the large number of AI rewrites of their content going around.
The issue here is not “consumer value,” it is “publishers not able to make money on their work,” the entire point of my original comment, which your reply doesn’t mention once.
It by definition promotes the video and the creator.
> publishers not able to make money on their work
Again you're wrong, the creator still gets the view and ad revenue not this third-party site where it's embedded, from YouTube:
Only YouTube and the video owner will earn revenue from ads on embedded videos. The owner of the site where the video is embedded will not earn a share.
I don’t know how you’re missing the bigger picture. It is taking away any reason to actually watch the video, which means nobody makes revenue from it. It is not a promotional tool—it is a replacement for the video.
I understand your take but I don't agree. By your logic no news site could display a video at the top then summarize the video in an article. This is one of the main use cases of the YouTube embed - which gives revenue to the creator when it's played on a third-party site (and the third-party site host gets no revenue) and the YouTube creator has the option to disallow embedding their video if they don't want it embedded anywhere - it's in their control.
The idea that the number of embed plays will be 0 on this site is just unfounded, and untrue as I just watched a video in an embed on this site. That creator just got a view, where otherwise I would have never seen their content, thanks to this website.
A news site can embed their own content and summarize it.
This site does not check if a video can be embedded before attempting to summarize it.
This șite exists to take the work of others, regardless of if they want it to be used like this and barely makes any effort to show the source. There's no link to the channel, no credits from the video itself.
> A news site can embed their own content and summarize it.
But not a YouTube embed? Well we just disagree. That's what the embed is for.
Whether or not a video can be embedded is under the control of the creator in their YouTube dashboard, not this third-party site.
> There's no link to the channel, no credits from the video itself.
Not true at all, the creator is listed in a pretty big font size, and there is a literal YouTube embed that links to the video (and channel if you click the avatar). The creator gets the credit and ad revenue from the embed, the third-party site doesn't.
> This șite exists to take the work of others
Not true either, see the news site example, or any blog or tutorial site that references videos.
This site summarizes videos regardless of their status on YouTube. Don't support embeds? Tough luck, this site will summarize your work either way and there's just a little box telling you that you could watch the video on YouTube.
This site exists to take the work of others. If not, please provide a link to the YouTube channel made by the person behind stepify and all their videos.
How do you know it doesn't fail to publish a page when there's no video available?
^ Just curious where you got the info. Beyond that:
Being publicly available on YouTube would still mean it is allowed to be summarized by another person. This restriction you speak of has never existed in any copyright law.
Says "Channel: Panigale Enthusiast" and the video is at the top-right of the page very prominently displayed.
On mobile - I agree it should be at the top, but for UI/UX reasons not because they're "siphoning from creators" which they are not - they are literally promoting the creators.
The people making a living from YouTube will make even more money now that their embed is on this site, won't they? Are you implying they don't get the views?
Why isn't it a link to the channel? Where is the description from the video which often includes the credits for the video, including writers, editors, etc?
People won't get views from this. This is taking the work of others.
The site is meaning to keep people on the site, not send everyone to youtube. It's a different form to consume and narrow down what to watch instead of investing less time in videos that don't have the information you might be after.
Many genres of videos dont really matter as much for a summary, so I'm not sure if this is super ubiquitious
There's a link in the embed itself too - what is your complaint?
They give credit to the creator - is your complaint that they shouldn't be allowed to summarize a video in text format? Like every blog or news that ever embeds a video does?
A small link buried in an embed widget which doesn’t even appear on every video is not sufficient credit.
A YouTube video has a description field, plus a title and the channel name. All of these should be shown at the top of the page, plus video creators should be required to opt-in to this service.
True, but voice recognition errors typically involve an oddly-out of place word or two which you can usually spot and mentally correct. That's less likely to make you take the wrong series of steps than a completely coherent and topic-relevant "hallucinated" sentence that just happens to not be part of the guide at all.
Edit: although in this instance the LLM pretty heavily editorialises the transcript anyway...
This looks amazing! As a marketer, many times I struggle with repurposing long video interviews into shorter tactical videos, and this what you built looks promising. I'm excited to check it you!
You should probably rework the recent video thing? Or not. I mean it's engagement, I guess, but I'm pretty sure people are intentionally putting silly videos on the page.
Tried it on one of my latest videos. Interesting results. My video is not quite a tutorial video, so I can understand why the results are not perfect. But it has invented quite a lot of content...
This is a great & useful resource! So many guides on YouTube are unfortunately padded with so much silliness and fluff. Would be great to link out to time codes if possible.
I could have used this on the weekend. I was working on my car, and though I had watched a few videos about removing the door, and electrical connections, etc etc. I missed on some of the details, or had to make a mental note of "this, then this, not the other way around".
What I think might be a great addition is if you had a screenshot for each point? Though I'm not sure how you'd figure out which image would best capture the action.
This is cool. I have been doing this a bit more manually, by using a Chrome plugin that does YT summaries and shows transcriptions using Claude. I don’t like those summaries so I paste the transcriptions into ChatGPT (GPT4) with a prompt “Provide detailed study notes of the following video transcript”. That gives me a very similar format to yours. Will have to do some side-by-side comparisons.
Interesting idea, but not quite useful. I tried two: How to replace a fiberglass window screen and how to replace the "cycle clutch gearh on an IBM Selectric Typewriter"
I'm curious if you noticed certain models worked better for summarizing and converting to steps. For example, in my projects I've found that Gemini outperforms "better" models like GPT for similar use cases, which I guess makes sense given Google's interest in summarization.
> The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application.
Hugs all around - I'd take it as a positive feedback. Congrats on the launch!
I am not. I'm from a 3rd world country and trust me when I say I this i've burned through half of my paycheck in a few hours which is like barely 3 digits.
I would suggest to put it at the top of the instructions.
What would be really useful - as someone else suggested would be to link directly to relevant parts of the video.
I've been looking for something like this for absolutely ages. If I want to figure out how to fix my cellphone, reset a warning sensor on my auto dashboard or more recently install a NAS box, there's always this long winded YouTube video packed full of ads. Thanks for helping cut through this nonsense.
One question- On the backend, is it downloading each video CC (closed-caption) transcript and feeding that into a tuned prompt? What happens for videos where this is missing? Asking because I've noticed CC is occasionally unavailable for some YouTube videos.
If you cared to have a fallback, a potentially interesting experiment / solution for such cases is to download the video, extract the audio to a WAV file, then through the audio through Whisper [1] to generate the transcript. Using CPUa, it will still be incredibly intensive and slow, generally not much faster than real-time (e.g. a 5 minute clip will take on the order of ~5 minutes to complete transcription). However, with Whisper running on a fancy GPU it is insanely faster, between 100-200x faster, meaning even for long videos, generating the transcripts will complete in only a few seconds.
Great job @aka_sh!
[1] https://github.com/openai/whisper
p.s. Is there any chance you'd open source your code? Or do you plan to turn this into a business? The code itself is exactly a huge moat, and it'd be cool to see how you did this. Cheers.
p.p.s. stepify.tech app is currently crashing out to a heroku error page when I try to submit a YT link.
Keep up the good execution.
There is limited need to reinvent the wheel to process audio when other things can be solved.
I haven't tried this yet but it would be helpful if each step included a link to the spot in the video where that step is shown, so that in case you need it it's easy to find.
I've had multiple instances where I had a simple issue with zero decent Google results, and a YouTube result with literally the exact question I had in the title. I had to sift through 12 minutes of "like and subscribe", a dude clicking around in various screens mumbling some stuff... I would have been very happy with a simple blog post
1. It took about ~45 seconds for the page to load once I put the URL in. You should have a loader on a page showing that the website is "doing something" while the AI transcribes.
2. It would be great to sync the chapters in the YT video with the guide details.
3. Even more advanced would be the specific items like "Drill holes, insert expansion bolts, and secure the inverter to the wall using nuts and washers." showed a timestamp and thumbnail with a link to the video part.
4. It would be great to have a checklist functionality (maybe this is the "pro version"). I often do something, get halfway and then need to scrub the YT video to find the specific place where he talks about the action item.
EDIT:
5. IMO iFixit has the best "guide" formatting: https://www.ifixit.com/Guide/How+to+Recover+Data+From+a+MacB... if you could somehow generate this by the video, that would be insanely useful.
1) Speed : the site is often showing heroku errors. Seems like you are running the entire processing in the request-response cycle. If not already done, please try to use a queueing system to perform async processing - and then let the user know when their video is ready to view as steps (probably via email or browser notifications). This will stop your site from crashing frequently and you'll be able to scale to many users very quickly.
2) Please add link-backs to the specific time in the video from where the step is shown.
Cheers!
Heroku just wants a bigger bill.
Is there a way to request items that were submit get removed? Can you provide a way to contact you such as an email address? There wasn't one posted on your site.
It's just a suggestion, I mean right now anyone can submit anyone's videos without their consent or ownership verification. How do you plan to handle that? I'm sure there will be folks out there who wouldn't feel comfortable that a site will be scraping their video content attempting to generate a large network of pages on 1 domain with loads of SEO terms. It provides a conflict of interest with the original creators. This conflict of interest is around SEO competition, reducing views from original creators and then there's the other can of worms of any future plans to monetize your site through subscriptions, paid features or ads where you'd be profiting from the content of others without their consent.
I posted one of my videos just to see what would happen and then it created a permanently hosted page on your domain with an AI generated recap of the video. I didn't realize that was going to happen. There was no warning, label of how it works, TOS that I agreed to or options available to make it private and there's no option to delete it. I put in the URL, hit submit and that was it.
It's nothing personal and I hope you don't see this as a deterrent. I'm all for building cool things and generally openly share almost everything for free (I've been blogging and making videos for ~9 years and don't have a single ad on anything I ever posted) but the idea of having inaccurate AI generated content does rub me the wrong way.
> The guides are generated from pure transcript so you don't have to worry about it being AI.
You mentioned it's generated from pure transcripts but most of the phrases used aren't what was mentioned in the video. It looks like a paraphrased version of it but it's also missing all of the details that would allow someone to follow along.
Directly under the video on the page it says "This response is AI generated". One one hand you say it's not AI generated but then on the other hand it is.
That's why we have licenses and YouTube's default license ensures creators retain ownership of what they upload and are protected by copyright. The license allows YouTube to broadcast the content.
Why do some of you think it is not okay to put YouTube embeds on a website???
YouTube embeds are a different story, that is an official YouTube feature which allows folks to embed a YouTube video on a 3rd party website. I have no problem with that. YouTube even allows creators to enable or disable that on a per video basis. I keep it enabled because it's useful and promotes sharing of the original content as it was delivered.
I have a problem with a 3rd party site taking a video and making a derivative of it without the consent of the copyright owner. It's violating the license that the video was uploaded under. They even went as far as explicitly claiming copyright ownership on all content on their site (at the time of this comment their footer reads: "© 2024 Stepify - All rights reserved.").
I don't like making assumptions but look at how responsive the original poster of this thread was to most comments. They replied to a ton of people, but not this comment. They've also made an explicit decision not to include any way to remedy this issue or even contact them through their website. I'll let you draw your own conclusions from that.
I wouldn't have even minded as much if the generated text was good but in this case it was wildly inaccurate and missed all of the details that would have let you follow along without the video. The site's official tagline is "Get a step-by-step tutorial of any video to follow along". If someone sees the text generated they might infer a video was of poor quality because this site claims it can produce a step by step tutorial of ANY video to follow along. That sheds negative light on folks who created the original video.
> Only YouTube and the video owner will earn revenue from ads on embedded videos. The owner of the site where the video is embedded will not earn a share.
Furthermore, the YouTube creator can choose to not let their video be embedded if they wanted that.
Do you have a problem with every news website that has a video at the top, then an article describing what happens in the video? How would that violate the licensing? It's unrelated to licensing - they're using the official YouTube embed. YouTube manages the copyright of the embedded content and can even control whether or not the video can be viewed in your country, etc. based on such restrictions.
> look at how responsive the original poster of this thread was to most comments but they ignore this request
Irrelevant, but I think because it's obvious you're misunderstanding copyright, or because you wrote such a big paragraph with many separate points being made that it's a lot of work to reply to. The copyright in his footer is for his IP, it of course would not apply to the content inside a YouTube embed. And it's not IP theft to summarize a video in what is essentially a blog post.
This type of tool could help create much more meaningful blog or website type content to build a mailing list around the community.
The problem with this thought process is that the creator has been taken out of the equation without actually talking to them about it, and when that question gets raised, there just seems to be lots of pushback, likely in part because it touches on the primary complication of LLMs (that a whole bunch of copyrighted content is getting siphoned without considering the people who made that information).
In this particular case, this is literally taking the information from that content and presenting it in a format the creator did not agree to, lowering the potential value of the video to end users. It is much closer to violating the creator’s copyright than generative AI often is.
Instead of pushing back, we need to bring the creators into the discussion to ensure this is something they’re OK with.
The creator isn't taken out of the equation at all - their content is being promoted and they're getting ad revenue for views there (as agreed upon in the YouTube terms).
And for the YouTube creator who decided to give their video to YouTube, but doesn't want it shared on third-party sites, YouTube lets them disable embeds.
Putting a YouTube embed, summarizing a YouTube video - neither are "violating the creator's copyright" which they already gave to YouTube anyway.
The problem is 100% the use of LLMs to pull the content in an unsanctioned manner. Considering that YouTube has sanctioned methods to share this information, in the form of transcripts, this directly competes with something that the creators are already making in a not-as-good form.
Additionally, it makes the video less desirable. The embed does not matter; creators allow them for a reason. It is the conversion from video to text in a way that the creators did not ask for and likely do not want.
If the public facing web wasn't crawlable Google and many other things wouldn't be possible. What are you saying that YouTube should not be viewable unless someone is properly authed? Take it up with YouTube - they could require logins to view videos if they wanted but it would be a worse product.
When a user posts something on YouTube and checks "Allow embeds" they have not only given their video to YouTube but are totally cool with people sharing it around. Who are you - their lawyer? Even most YouTube creators do not share your opinion as I can see most are allowing embeds. The point of mentioning embeds is it's both:
1) Credit to the creator, including the revenue share which you were incorrect when you said it lowers their revenues - they still get the ad revenue from an embed
2) An indicator that the creator wants their content to be shared, since they have the option to disable them if they want and choose not to
The adjacent point you seem to be making is that nobody should be allowed to crawl information and present it as their own. But that's what a lot of the internet is. It's what a search engine is, it's the source of most online encyclopedias, news sites, and so-on.
It just doesn't logically follow that if somebody uploads a video to YouTube, that nobody is allowed to summarize it in text form. That's a very normal thing that is done online.
By using LLM he has created a method to literally outclass every video on YouTube. People will not watch the video, because what’s the point, all the information is already there, taken so fully as to likely not pass fair use claims.
Please stop throwing whataboutisms into this argument. You keep doing it, and they aren’t sticking. This is about this specific site, which takes the full content that people make (not a summary, enough so you can follow along without watching the video, which would go beyond fair use), strips out the ads, and puts them into a format the creator didn’t intend. It is not about embeds, it is not about Google, or aggregators, or scrapers. It is not promotional, and nor does it have any side benefits. It is just taking content that isn’t theirs and packaging it as a modest benefit to the end user.
This site is extremely unethical as is, with limited benefit to the people who actually made the content. No sidestepping or whataboutisms will change that.
YouTube channels routinely analyze each other for content and doing it as well “in their own way”
Packaging content in a different way is similar to repurposing content for different social media platforms.
Heck some people share other peoples content on different platforms.
This site is putting a particular lens on a video and sharing the perspective it generates for those who like it.
If it became a product for video creators to generate content for a better description they might never need the site.
When you upload a video to YouTube, you are licensing Google to redistribute a copy of your content at their whims. The uploader agrees to give Google "a worldwide, non-exclusive, royalty-free, sublicensable and transferable license to use, reproduce, distribute, prepare derivative works of, display, and perform" your video.
The video you upload to YouTube isn't yours anymore. You can pretend it is, and play around with it like a little paper doll for all it's worth. You don't own it anymore though, and your right to judge where it is and belongs is stripped the moment you click "Upload".
The outcome is not competing with the source video. Anyone's who's interested in the topic would likely still choose which video they want to delve into further.
Sometimes you want to listen without having to watch.
Other times you want to read a summary to see if it’s worth watching instead of lighting 20 minutes of your life on fire that you’ll never get back.
Time is the ultimate currency and it’s not so bad to consider how to protect it by using it with the length of time you’re choosing to invest.
The email list in that case would belong to whomever owned the site.
Still, a lot of creating is based on other creation.
It’s a reality that this form of a video is akin to taking written notes for it. Maybe what’s upsetting is it’s quite decent at it, perhaps reflecting the work of the developer to get it there. It’s not an easy feat.
Taking notes can’t be illegal and a copyright violation. Neither does expecting people to watch an entire video to recall something make a lot of sense.
If this product provided embeddable summaries for video creators to put into their descriptions it could be pretty useful too to some.
It feels like there is some kind of attachment to video because of how laborious. Making videos easier to create is an area I’m engaged in. It is about to become much easier, and not from the LLM or generative video side.
The process of editing videos is in the stone ages and quite laborious. Lots of opportunity to improve there, and once they become easier those who were able to create before as an advantage will have to make sure their videos are even better.
In the comment you're replying to (mine), I literally wrote:
> YouTube embeds are a different story, that is an official YouTube feature which allows folks to embed a YouTube video on a 3rd party website. I have no problem with that. YouTube even allows creators to enable or disable that on a per video basis. I keep it enabled because it's useful and promotes sharing of the original content as it was delivered.
In your comment you've written things like "Furthermore, the YouTube creator can choose to not let their video be embedded if they wanted that." which implies you haven't read the comment I wrote because I mentioned that. I'm also not in disagreement that embedding is generally useful and I support it fully.
That makes me think you might have replied to the wrong person?
You're implying the embed is being used unethically or in an illegal way that violates copyright - it's not.
> That's why we have licenses and YouTube's default license ensures creators retain ownership of what they upload
Not true: YouTube manages copyright themselves and can even control which countries the video can be viewed in etc. And the rights are given up by the creator when they agree to YouTube's terms which grants YouTube a:
which of course includes their site embed.> I have no problem with that. YouTube even allows creators to enable or disable that on a per video basis. I keep it enabled because it's useful and promotes sharing of the original content as it was delivered
If your problem is the fact that the video is summarized in a blog post, tutorial, article, etc. then I still disagree, and maintain that it doesn't violate any copyright - the purpose of the YouTube embed is to display the content on another website.
I hope that didn’t wreck your compute costs
// I'm not really kidding! Because boy do I hate 15 minute videos with the one CLI command you need buried like a needle in a haystack. Seeing the nonsense distilled into a handful of straightforward steps is so refreshing. Awesome work!
Giving the 15 seconds up front and then explaining it in more and more detail can also be appreciated by users.
“Seek feedback from stakeholders or viewers by encouraging questions and comments for further engagement.”
This is from a bathroom remodel video.
Whisper or other models can help with that too, but remember to preprocess to cut silence. The dataset tends to include ads in the captions, which results in hallucinated in from silence.
You could also add a transcript-evaluation step which checks whether this actually looks like a step-by-step video, but I'd consider skipping it for cost and efficiency. Trying to be helpful by evaluating whether the video is instructions or not is added complexity where bugs and strange behavior can creep in.
I so appreciate these open source/access models allowing us to build these kinds of tools without having to pay and send our data to openai.
Whisper's is supposed to be better in some cases, but Google's probably works very well at scale.
I tried entering a new video but I got a Heroku application error. Maybe it's a limits thing.
When I look at the Recent videos, a lot of them are not for instructions/tutorials. Perhaps people do not understand the purpose of this project. Maybe they are just testing it out with non-tutorial content.
Maybe you could add representative videos towards the top so that people would get a better sense of the use of this project?
I don't know why this isn't more popular here. It's a good idea. (Maybe it has already been implemented elsewhere?) Reading is much faster than watching a video for many instruction-based tasks. Good luck!
Can you tell me more about the video you entered? Did it have a transcript? How many hours long was it?
What you are doing is, whether you’ve considered this or not, at risk of harming people who are building around video because it is financially viable. People produce these guides as videos because that’s how they can make money from them, whereas it is much more difficult to do so on a website.
You need to consider the implications of what you’ve built.
I just wouldn't use the word "siphoning" here. There are countless blog posts, news articles, how-to guides, etc. that will embed a video like this yet also provide supporting text for readers. I think it's a pretty normal way of sharing content.
I for one am not a person who learns by watching videos, step-by-step guides work better for me. The idea that all those video tutorials could be made available as text-based guides sounds actually very useful - and I would still be very aware of who originated that content as their video is embedded right there.
It would actually be great if when I search for a tutorial and the most relevant result is a video, if my browser could summarize that video the way search engines summarize results at the top or in the side bar.
It pulls out the information without adding anything of value, while making it impossible for the creator to make money from it.
This happens with text content, too. Ask publishers about the large number of AI rewrites of their content going around.
The issue here is not “consumer value,” it is “publishers not able to make money on their work,” the entire point of my original comment, which your reply doesn’t mention once.
> publishers not able to make money on their work
Again you're wrong, the creator still gets the view and ad revenue not this third-party site where it's embedded, from YouTube:
https://support.google.com/youtube/answer/132596?hl=en-do-i-...The idea that the number of embed plays will be 0 on this site is just unfounded, and untrue as I just watched a video in an embed on this site. That creator just got a view, where otherwise I would have never seen their content, thanks to this website.
This site does not check if a video can be embedded before attempting to summarize it.
This șite exists to take the work of others, regardless of if they want it to be used like this and barely makes any effort to show the source. There's no link to the channel, no credits from the video itself.
But not a YouTube embed? Well we just disagree. That's what the embed is for.
Whether or not a video can be embedded is under the control of the creator in their YouTube dashboard, not this third-party site.
> There's no link to the channel, no credits from the video itself.
Not true at all, the creator is listed in a pretty big font size, and there is a literal YouTube embed that links to the video (and channel if you click the avatar). The creator gets the credit and ad revenue from the embed, the third-party site doesn't.
> This șite exists to take the work of others
Not true either, see the news site example, or any blog or tutorial site that references videos.
Do you know how YouTube embeds work?
This site summarizes videos regardless of their status on YouTube. Don't support embeds? Tough luck, this site will summarize your work either way and there's just a little box telling you that you could watch the video on YouTube.
This site exists to take the work of others. If not, please provide a link to the YouTube channel made by the person behind stepify and all their videos.
Again wrong - the creator can disable embeds for any video on their YouTube dashboard.
How do you know it doesn't fail to publish a page when there's no video available?
^ Just curious where you got the info. Beyond that:
Being publicly available on YouTube would still mean it is allowed to be summarized by another person. This restriction you speak of has never existed in any copyright law.
Summary just like any other video but just the standard box when embedding is disabled.
You should try it for yourself with one of your own YouTube videos from your own channel.
But there’s lots of no embed videos on YouTube. You can also just try it with one of your own videos.
It can be an invitation to the entire video because a summary may not cover all the details.
Maybe you are noticing some of the summaries might be better than the videos themselves writing wise?
The folks taking relatively simple videos and extending it out into a video script to milk watch time probably just need to make better scripts.
No attribution is provided. There's no credit given. The description from the video is not provided.
People make their living from producing YouTube videos. This isn't cool at all.
Says "Channel: Panigale Enthusiast" and the video is at the top-right of the page very prominently displayed.
On mobile - I agree it should be at the top, but for UI/UX reasons not because they're "siphoning from creators" which they are not - they are literally promoting the creators.
The people making a living from YouTube will make even more money now that their embed is on this site, won't they? Are you implying they don't get the views?
People won't get views from this. This is taking the work of others.
Many genres of videos dont really matter as much for a summary, so I'm not sure if this is super ubiquitious
It's embedded and says the channel name in a giant font size. The creator gets the views.
> People won't get views from this
Do you know how YouTube embeds work?
The creator doesn't get views unless someone clicks to play their video. Most people won't click to play the video.
Also this site makes no attempt to check if a video can be embedded before displaying the text.
They give credit to the creator - is your complaint that they shouldn't be allowed to summarize a video in text format? Like every blog or news that ever embeds a video does?
A YouTube video has a description field, plus a title and the channel name. All of these should be shown at the top of the page, plus video creators should be required to opt-in to this service.
That just means you have to worry about voice recognition errors instead.
Edit: although in this instance the LLM pretty heavily editorialises the transcript anyway...
https://stepify.tech/video/1-Rm0mgg2RI
Here's the video for reference:
https://www.youtube.com/watch?v=1-Rm0mgg2RI
What I think might be a great addition is if you had a screenshot for each point? Though I'm not sure how you'd figure out which image would best capture the action.
https://stepify.tech/video/623AC6a6org
is the first featured video…
In any case, it’s doomed- google will cut off the access or integrate the feature on their side. They thank you for the proof of concept though.
https://stepify.tech/video/KafAn1h4x14
Neither were good enough to use.
I'm curious if you noticed certain models worked better for summarizing and converting to steps. For example, in my projects I've found that Gemini outperforms "better" models like GPT for similar use cases, which I guess makes sense given Google's interest in summarization.
1) record an SOP using Loom while you narrate, 2) grab a transcript of your narration, 3) feed transcript into ChatGPT to write list of instructions.
Was billed as a way to easily hand off processes to contractors or subordinates.
This seems like a cool riff on that. Neat.
> The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application.
Hugs all around - I'd take it as a positive feedback. Congrats on the launch!
see: https://news.ycombinator.com/item?id=40112792
How are you managing costs and offering this for free?