Ask HN: How to load-balance two 1 Gbps connections with a failover?

Hi! I used to have a single 1 Gbps connection running from ISP A. Unfortunately, it has been spotty, sometimes dropping out for as much as 30 minutes. 1 Gbps fiber is not expensive where I live (~$15/month), so I've decided to just get a second 1 Gbps fiber from ISP B. Since I can't cancel the service from ISP A just yet due to the agreement length (doesn't make financial sense to pay the early cancellation fee), I ended up with two FTTH setups.

I'd like to combine those two connections to:

1. Have a quick failover in case either fiber link fails.

2. Load balance them for increased maximum throughput, assuming different connections. Not interested in an overly complicated remote setup for combining bandwidth, since neither of the ISPs supportss MLPPP and I have no need for higher speeds, anyway. In an ideal scenario, two different machines would be able to download at 800-900 Mbps each (which is what I usually get from each connection), saturating both ISP A and ISP B connections. Meaning that the router would ideally be able to process 2 Gbps of traffic in total.

3. Have the new setup be quiet and small, it'll live in small-ish cabinet next to the ISP routers (more on that below) and I definitely don't have the space there to stuck my old PC with PFSense in there.

4. I need a reliable, business-grade solution that will require minimal/no maintenance and will sit there working for years and years. It's unacceptable for my connection to keep cutting off on key meetings, where I'm often the person presenting.

Additional information: No need for it to have Wi-fi AC, I have a separate AC, no need for LTE capabilities, too. Upload isn't as important, I don't do a lot of hosting here nowadays, everything moved to the cloud. If it's relevant, both links are 300 Mbps up, so not symmetrical. The configuration and device doesn't have to be super end-user-friendly (would be nice, though, of course), I'm not a networking expert, but I've been working in IT for quite a bit and I'm sure I can handle some basic network configuration if required (I have some experience with configuring Cisco switches and routers). Hopefully it won't cost that much, but I'm willing to spend upwards of $1000 on this if there is a good reason, as this will be used for my only source of income.

Now, I've done a plenty of research and there are a ton of multi-WAN routers, e.g. TP-Link TL-R605, Synology RT2600ac or business-grade Linksys LRT224 and the Peplink Balance line.

I would've ordered one of these by now, but my key question is, which I wasn't able to easily research: Can the WAN connection be a cable to a LAN port on a router from the ISP? Because I don't have ISP-provided WAN Ethernet cables coming out of the wall and that's what has always been going into my routers' WAN port. I have two fiber cables coming out of the wall, connected to a router from the ISP each and those have regular LAN GE ports. I've asked the new ISP and they said I'm required to be using their fiber router, otherwise my 8h internet-back-up SLA and other maintenance services in case of issues don't apply anymore.

What kind of a device will allow me to connect it via cat6 Ethernet to two different routers on different subnets? Is it what all those "WAN" ports on the multi-WAN routers expect already that I'll do?

The Peplink Balance line looks it has all of the features I need, but it's been quite difficult for me to understand what the total throughput is if I don't need a firewall on it. If the Balance One supports up to 5 WAN connections, it means it can handle load-balancing 5 Gbps of traffic? Doesn't seem so, it looks the actual throughput is the "Stateful Firewall Throughput", which would then be barely 600 Mbps... Which router would support 2 Gbps total throughput to be able to saturate both ISPs? If 2 Gbps is not possible within a reasonable budget, I can settle on 1 Gbps, that's plenty for me, too.

Thanks!

5 points | by Tenemo 389 days ago

5 comments

  • toast0 389 days ago
    > What kind of a device will allow me to connect it via cat6 Ethernet to two different routers on different subnets? Is it what all those "WAN" ports on the multi-WAN routers expect already that I'll do?

    Yes. The multi-WAN routers can (or should) be able to manage any sort of WAN over Ethernet handoff; bridged PPPoE, DHCP on public address, DHCP on private address, static IPs. One of those fits what your ISP router provides to LAN clients.

    Personally, I'd take the time and build an overly complicated custom system (and I have!), but off the shelf should work too. The tricky thing if you build it yourself is determining when the links are up; for my system, I have PPPoE as primary and low quota LTE as secondary, so it's easier, if the PPP link is connected and has responded to ping within the last 30 seconds, I use that, if not I use LTE. In your case, you might need to build some other liveness check.

    Don't expect perfect balancing either --- sometimes multiple bulk downloads will end up using the same connection and you won't have any using the other one, but statistically things should work out.

  • Dah00n 389 days ago
    Sorry I don't have time to add much info, but I have a setup I'm quite happy with. I use OPNsense (PfSense is banned in my house, with all the bad things they've done) and it seems to cover your needs too if I didn't miss something. The non-open source options are, in my opinion, awful and will at some point (if not already) be a security menace.

    Edit:

    >If 2 Gbps is not possible within a reasonable budget, I can settle on 1 Gbps, that's plenty for me, too.

    Even the very old End-of-life PC Engines APU1D (AMD T40E Bobcat) can handle 1 gbps (I have 4). If something new cannot I would look elsewhere.

    • KomoD 389 days ago
      > PfSense is banned in my house, with all the bad things they've done

      Like?

      • Dah00n 387 days ago
        For one pFsense used the domain opnsense.com to discredit OPNsense and had the domain taken away from them. They also heavily edit Wikipedia if anything negative is added. There are some pages on Google about it, but I can't be bothered to search for it atm. See pfsense @ wikipedia for the domain thing.
      • polski-g 389 days ago
        • KomoD 389 days ago
          That's a terrible example, random reddit post from 6 years ago with no explanation
  • graybies 388 days ago
    Off the shelf? Edgerouter lite can do 1Gbps dual wan with load balancing and failover. Not sure if it can do 2.

    Otherwise building a pfsense/opnsense box would probably be the route you want to take.

  • aynyc 389 days ago
    I’m no network expert, but Synology router supports dual wan interface. Would that work?
  • johnklos 389 days ago
    Many years ago, Fry's Electronics had a close-out on 2014 era mini-ITX AMD AM1 motherboards and CPUs (less than $50 for motherboard, CPU, heat sink and fan). I've been using them, with suitably small cases, as NAT / router / firewall / DNS machines, and even though there are just four older 2 GHz cores, with NetBSD and npf I'm able to run at full gigabit speeds just fine. I've set up around ten of these AMD systems and have run them for many years.

    One way I've shared Internet with two providers is by having two routing machines, each configured with its own Internet connection, with different local IPs on the same LAN. Clients can each be configured to use the preferred gateway, or they can be switched using DHCP. Likewise, when one of the lines is down, the default route can be removed from the machine that handles that line and can be switched to the other machine's internal IP, which will DTRT and will cause clients to use the other. This has the advantage of not requiring waiting for an updated DHCP lease, so switching can be nearly instantaneous.

    This is useful if one line is more robust than the other but isn't as fast - the machines that need a reliable connection always use the robust connection, and everything else opportunistically uses whatever's available.

    Another is simple round-robin of NAT to both connections.

    Neither helps if the line that you're using drops mid-call, but there are ways to deal with that, if you have high speed available at a datacenter. For instance, I've set up routing of a small public subnet via tinc (https://www.tinc-vpn.org) over two routing machines, each with their own Internet connection, with CARP so that packets can go through either machine. This makes handing off from one to the other transparent so that connections don't need to be reestablished.

    To get back to your original query, I haven't seen any off-the-shelf NAT router that either does what a host-based router can do without some parts being proprietary or without a good number of drawbacks and limitations, nor have I seen hardware that can do anything fancy (that is, anything beyond the most simple routing / NAT) at high speeds without spending lots of $.

    Recent connections that are 2 Gbps and faster are served very well by Ryzen 5600X systems with 2.5 and 10 gigabit ethernet, and those systems cost around $500 each.

    Most of my machines have been set up many years ago, and automatic scripts update things like BIND and other software, so they require almost no maintenance after initial setup. On the other hand, remote administration is dead easy because they all use ssh with keys (no passwords), and can be used to help facilitate remote administration of machines on the local network, too.

    In other words, I can't think of any reason any more to buy off-the-shelf NAT routers. Even if I wanted to go that route, there are too many shortcomings for me to imagine doing that - I'd give up significantly more flexibility than the amount of time I'd save would ever possibly balance.

    To answer your questions about layering NAT routers, yes, you can do that, although it's discouraged. If you're forced to use the ISP's routers, you should at least ask if they can be put in bridge mode so your device can do the NAT, since many of those ISP provided routers have tiny NAT state tables and/or time out NAT states for no good reason.

    Also, there's no such thing as NAT that wouldn't fit the definition of "stateful firewall", so it's hard to know what they're advertising. For instance, the Peplink Balance 580 advertises 1.5 Gbps throughput, and that's in aggregate - it definitely can't handle all five WAN at 1 Gbps at the same time, else they'd advertise 5 Gbps. Considering the prices of the hardware, a brand new, physically tiny, host-based NAT router / firewall would be both cheaper and significantly faster.

    Just some thoughts :)

    • daydream 389 days ago
      > Neither helps if the line that you're using drops mid-call, but there are ways to deal with that, if you have high speed available at a datacenter. For instance, I've set up routing of a small public subnet via tinc (https://www.tinc-vpn.org) over two routing machines, each with their own Internet connection, with CARP so that packets can go through either machine. This makes handing off from one to the other transparent so that connections don't need to be reestablished.

      Can you explain more about the setup above? I don’t understand how tinc comes into play. What OS and physical configuration are you using?

      • johnklos 388 days ago
        Sure :)

        I'm running tinc on NetBSD.

        Starting with how it works and physical setup, you have tinc running, attached to a tap interface, bridged to an ethernet interface. The ethernet interface is configured on your LAN to route (not NAT) for your local computer(s) / VoIP devices.

        What's good about this is that a machine on your local network (seeable from the ethernet that's part of the bridge) can be on a 100% public address, if you want, or on a private IP range. When routing, as opposed to NAT, only the endpoints care about the existence / state of any connections, which is one of the reasons the Internet is so resilient (your intermediate hops can change without needing to renegotiate sessions).

        This means that intermediate machines, including the ones running tinc, can be down for a minute - even restarted - and, if the machines are patient, the endpoint machines will just see a temporary pause.

        So take two machines, each running tinc, each configured with CARP, sharing the gateway IP for the public subnet from the upstream machine. Your local machine running your video conferencing software (or, perhaps VoIP phone) communicates through the primary (active) CARP local machine. Suddenly, a shot rings out! And the primary CARP machine dies, and the maid screams, but before the scream the secondary CARP machine takes over the IP and is routing those video (or SIP) packets, so we can hear the maid scream.

        There are ways to failover NAT, but they're complicated because the primary and the failover machines need to constantly share the NAT state table. Even if you're routing (not NATing) private IP subnets, so long as you're doing NAT upstream, it still works.

        Does that make sense?