Requesting a sanity check...help untangle my home network as I expand into more advanced networking?

surfrock66@lemmy.world · 1 year ago

Requesting a sanity check...help untangle my home network as I expand into more advanced networking?

surfrock66@lemmy.world · 1 year ago

I probably need to burn it down and restart, but I need to find a time the family will tolerate an extended outage. I did share some things on the opnsense forum though which might be useful here.

My diagram. At the bottom you will see why I have /16; in truth, it’s from back when I only had a single subnet, and I made it /16 so I could use the third octet to form DHCP scopes. That’s how the network worked in my head and I knew the IP scheme, so when it came time to add VLANS much later, I just made those the 2nd octet, and that’s how we are here today. Maybe one day I’ll re-do that, but it’s not in scope right now: https://nextcloud.surfrock66.com/s/txnZdzxHaiA5t65
I did an experiment with static routes last night. I have the static route in, so I untagged the “LAN_GW” as an upstream gateway, and tagged “WAN_GW” as an upstream gateway. No change in the ability for opnsense to ping anything (it can ping WAN, not LAN), however all my LAN clients lost internet. In this state, from opnsense, I ran a “ping -S 10.99.1.40 10.2.2.213” (that’s my DNS server). This failed, but interestingly enough I was looking at the live logs, and even though the interface is LAN, the source IP was the WAN IP. I’m very confused; I’ve confirmed the LAN and WAN interfaces are correct and they have correctly assigned default gateways. See the attached picture. This would make sense; is opnsense doing something to switch the LAN and WAN somehow? I’m blown away how this is the case; that being said, it makes sense that tagging the LAN interface as upstream allows traffic out.

It feels like somehow opnsense is treating LAN like WAN or something? I don’t know the obfuscation feels like it’s hiding things. A “ping -S 10.99.1.40 10.2.2.213” shouldn’t show in the logs with a source of the WAN address, right???

tuxed@sh.itjust.works · 1 year ago

Okay, I think I know (at least one of) the problem(s).

It is sending the ping from the WLAN interface because that is your default route, and you either don’t have a route specified for your 10.2.x.x network or you’re overwriting it with a different route (I’m guessing the first option).

E.g. you need to tell your firewall “if you want to reach an ip-address in 10.2.x.x you need to go through here”, with “here” probably being either your managed switch if it works as a gateway (10.6.1.254?) or an interface on your router if it works as a switch (10.6.1.41?).

surfrock66@lemmy.world · 1 year ago

I’m totally with you…and I have that, which is why I think I’m hitting some sort of bug, or a firewall rule that is somehow breaking this:

tuxed@sh.itjust.works · 1 year ago

Have you tried setting the gateway to one of your LAN interfaces? And what happens if you ping 10.99.1.254 from the firewall?

surfrock66@lemmy.world · 1 year ago

If I go to my LAN interface and set the gateway to “LAN_GW” at 10.99.1.254, everything works (but I can’t ping anything on the LAN from the firewall itself, including the client I’m ssh’d from). If I set that to Auto, all LAN clients lose WAN access.

I’ve got a backup, but I think I’m gonna try to rebuild from scratch :/ I just worry I’m gonna end up in the same spot since I don’t understand how it all got here and don’t know what to avoid.

tuxed@sh.itjust.works · 1 year ago

Probably a good idea sadly… There can be a lot of different things wrong, so will probably be faster doing that either way.

When rebuilding, try to verify each that each step works so you find the problem eventually, Im guessing it will be easier to find that way

surfrock66@lemmy.world · 1 year ago

Ok, good news, I re-imaged and after about an hour of tinkering it’s working. (My wife is a doctor who does tele-medicine from home so it was tricky to get a downtime, even riskier if I couldn’t get back to working; usually she works when kids are in bed and that usually my window for these kind of projects). I still have my old config backup; I have a lot of firewall rules and services to put back in (I had redirects for google trying to reach their dns from chromecasts to my pihole, I had a zabbix client pointing to my zabbix server, I had wireguard working and want to see if I can restore existing key exchanges, it was tied to my LDAP server, etc). I really want to compare my old backup with a new one when this is done and see if I can’t figure out what was broken. I want to document that because I found a bunch of people with similar questions that only had incomplete answers:

From the CLI, the WAN interface was DHCP, I set up the lagg between my 2 ports (lagg0), created a vlan 99 interface off of it (lagg0_vlan99) and made that the LAN interface with a static IP and no gateway.
I made a gateway for my 10.99.1.254 LAN gateway, had to assign it to the LAN interface when I made it. It is not tagged as upstream. One thing I noticed, WAN_GW is priority 255; it was 254 before. Just a difference I noticed.
I made an alias for each of my VLANS that might need internet access
In Outbound NAT, I switched it to Hybrid and made rules to allow traffic through to each VLAN.
Under Firewall->Rules->LAN I created a pass rule for each VLAN (This will get tuned later)

With this, LAN clients access the WAN, after putting in a port forward WAN clients can access things on the LAN, the firewall can ping both LAN and WAN.

tuxed@sh.itjust.works · 1 year ago

Glad to hear it seems to be working! Hoping you find the issue in the backups, would be interesting to know what went wrong haha