2020-04-18 I switched my home network to IPv6

As I said in this post, I have a router running pfSense to serve my home computers. Over the last few weeks, I decided to explore how much effort it would be to enable IPv6 for the router and all the devices on the LAN. I did not know much about how IPv6 actually works at the routing level, so it was a good learning experience.

The tl;dr is that everything was pretty much turn-key, except for getting Docker to serve publicly routable IPs to its containers, and getting NAT64 working on pfSense using tayga. The documentation for the former is either outdated or doesn't take firewalld into account, and the documentation for the latter doesn't exist anywhere on the internet. So I hope this post helps other people trying the same thing.

The IPv4 setup

This is my existing IPv4 setup. I've labeled the interfaces with their names (em0 and igb* for the router, enp* for the computers, etc):

                           +-----+
                           | ISP |
                           +-----+
                              |
                              |
+-----------------------------o----------------------------+
|                         WAN (em0)                        |
|                                                          |
|                      pfSense Router                      |
|                                                          |
|      +--------------+--[bridge0]---+--------------+      |
|      |              |              |              |      |
| LAN1 (igb0)    LAN2 (igb1)    LAN3 (igb2)    LAN4 (igb3) |
+------------------------------------o--------------o------+
                                     |              |
                                     |              |
                                +----o----+    +----o---------------+
                                | enp0s25 |    | enp4s0    docker0  |
                                |         |    |                    |
                                | laptop  |    |      desktop       |
                                +---------+    +--------------------+

The bridge0 interface is a bridge across all the LAN NICs of the router and binds to the 192.168.1.0/24 subnet. The DHCP server runs on the bridge and serves IPs from a pool in this subnet.

The desktop and laptop run openSUSE Tumbleweed, and use systemd-networkd for network management.

Lastly, the desktop runs Docker containers, which means it has its own docker0 bridge interface that runs its own DHCP server and NAT for the Docker containers.

The goals of the exercise, in decreasing order of importance, were:

  1. Get a publicly routable IPv6 address for the router.

  2. Get publicly routable IPv6 addresses for all the physical machines behind the router.

  3. Get publicly routable IPv6 addresses for all Docker containers on the desktop.

  4. Disable IPv4 on the LAN completely.

Note that publicly routable doesn't mean publicly reachable. The router would still be running a firewall and default to dropping incoming connections on the WAN. The advantage of having publicly routable IPs is merely to have a single globally valid address for every resource.

Interlude

(This section explains basic IPv6 terminology and prefix delegation. Skip to the next section if you already know this.)

An IPv6 address is made up of eight segments, where each segment is a 16-bit value. The string representation puts each segment with one to four hex digits (leading zeros stripped). Segments are separated by :. A single run of consecutive segments that are all 0 can be replaced with ::. Thus the address represented by 2001:db8:0:1::2:3 is the same as the address represented by 2001:0db8:0000:0001:0000:0000:0002:0003

Just like IPv4, a range of addresses is represented with CIDR notation. The 2001:db8::/32 range is reserved for examples, which is why I'll use it throughout this post.

Devices that want to get addresses from a router send out solicitation requests, to which the router responds by advertising itself using Router Advertisement (RA) messages. The RA message contains the information about the DHCPv6 server, if one exists on the network. The devices may then get addresses from the DHCPv6 server, which may be static or random, or they may use SLAAC to construct addresses for themselves based on the prefix advertised in the RA message plus their MAC address.

When it comes to the mechanism of how routers talk to gateways (upstream routers), it's worth differentiating between link prefixes and routing prefixes. The router first gets an IP for itself from the gateway, say 2001:db8:0:1:2:3:4:5/64. This IP is called the link prefix, and the /64 here just represents the length of the prefix. The router then requests the gateway to delegate another range of addresses to it that it can serve them to its downstream devices. The gateway might decide to give 2001:db8:0:2::/64 to the router. This is the routed prefix. The gateway only needs to remember that all traffic for all addresses in the 2001:db8:0:2::/64 range should be routed to the router at 2001:db8:0:1:2:3:4:5; it does not need to store individual routes for addresses within that range. This process where a device obtains a routed prefix from an upstream router so that it can itself act as a router is called prefix delegation.

When subnetting IPv6, one usually does not make subnets smaller than /64. Within a /64, the router and the devices can discover each other using NDP. Splitting a subnet across multiple links requires proxying the NDP messages across the disjoint links, which is doable but more trouble than it's worth.

This means if the routed prefix is only a /64, then the router can only create one /64 subnet. If you wanted more subnets, you have to work with upstream to have them delegate your a prefix that's larger, say a /60 or /56 or /48. For example, if your router gets a /60, then it has the ability to make more than one /64 subnet, 2^(64 - 60) = 16 subnets to be precise.

Level 1: IPv6 for the router

My ISP does not yet support IPv6, so I set up a tunnel with Hurricane Electric. Setting it up with pfSense was just a straightforward matter of following Netgate's documentation - I added a GIF interface which then functions as a second WAN interface for the other services on the router. At this point I was using just a /64 that HE hands out by default.

Level 2: IPv6 for all the physical machines behind the router

Just like the Netgate documentation, I enabled the DHCPv6 server on the bridge0 interface. I did not want to add static mappings for the two devices, but I also wanted them to have fixed addresses instead of periodically rotating their addresses so that I could add DNS records for them. So I set the router's RA daemon to run in "Stateless DHCP" mode. This meant the devices would use SLAAC, which as I said above meant they would get deterministic addresses.

For systemd-networkd, this means the network file for the enp4s0 interface looks like:

[Match]
Name=enp4s0

[Network]
DHCP=yes

Level 3: IPv6 for all the Docker containers

Here, all hell broke loose.

Recall that the desktop has an enp4s0 interface connected to the router, and a docker0 bridge created by Docker automatically. Ideally, you'd be able to configure the Docker daemon to set up routes such that containers can talk to the DHCPv6 server over the enp4s0 interface. However while it may be possible for custom Docker networks I create myself (using the macvlan driver), I did not find any way to do this for the default network it creates.

The other way is to use prefix delegation and delegate a whole /64 to the Docker host. In this case, the host acts as a router for the containers. Assuming the host is able to get a prefix 2001:db8:0:f002::/64 delegated to it, you would configure Docker to use it by setting the fixed-cidr-v6 field in /etc/docker/daemon.json. You'd also add the DNS server's IPv6 address:

{
  ...
  "dns": [
    "2001:db8:0:1::1",
    "192.168.1.1",
  ],
  "ipv6": true,
  "fixed-cidr-v6": "2001:db8:0:f002::/64"
}

So how does one get the host to request a prefix delegation from the router? Recall that the first step is to request a larger prefix than /64 from the ISP. In my case, HE does let you opt in to get a /48, so I did that and updated the LAN bridge IP to be under the new /48 prefix. I then configured the DHCPv6 server to reserve one 2001:db8:0:f002::/64 subnet under the /48 for prefix delegation.

As for the desktop, the documentation for systemd-networkd does have an example of prefix delegation:

# /etc/systemd/network/55-ipv6-pd-upstream.network
[Match]
Name=enp1s0

[Network]
DHCP=ipv6

# /etc/systemd/network/56-ipv6-pd-downstream.network
[Match]
Name=enp2s0

[Network]
IPv6PrefixDelegation=dhcpv6

Translating this to the desktop's setup, the first file is for the enp4s0 network and the second for the docker0 network. The file for enp4s0 already matches what I have (DHCP=yes means both ipv4 and ipv6), so that's fine. But I don't have a network file for the docker0 interface since it's supposed to be managed by Docker, not systemd-networkd.

After a bunch of fiddling around, it did not seem to me that it was possible to have systemd-networkd request a prefix delegation without also having it manage the interface that the routed prefix would be bound to. I decided to use dhclient directly. To test, I ran dhclient -d -P -v enp4s0 where -d tells the program to run in the foreground instead of forking a background daemon, and -P tells it to make a prefix delegation request. It successfully requested a prefix, and I could see the DHCPv6 lease in the pfSense status screen. Excited, I created two alpine containers and ran ip a in them, and was delighted to see they'd bound to addresses within the delegated /64.

I then had one of the containers run nc -l -p 8080, a simple netcat server, and had the other container run nc <IP of the first container> 8080, a netcat client to connect to the server. I hoped to see a successful connection. Instead, the client exited almost immediately. I re-ran the containers with --privileged, installed strace with apk add, and ran both the server and client nc processes under strace -fe network,read,write. This showed me that the server successfully bound to [::]:8080, but the client failed its connect() call with EACCES

EACCESS sounds like a permissions issue, but it didn't make sense. You could get EACCES if you were trying to bind to a low port (less than 1024), but the nc in the containers was running as root so that wasn't the problem. You could get EACCES if the container did not have some sort of network caps, but this was happening with --privileged containers so that wasn't the problem either. Just to be sure, I also gave the containers the NET_ADMIN cap, with no change.

I then attempted to connect to the netcat server with a client nc running on the host, and that succeeded! So the issue was certainly not with the server or its listening socket.

I said above that I'd verified the DHCPv6 lease for the delegated prefix in pfSense. Now I decided to also check the pf routes table. Recall that the upstream router needs to associated the routed prefix with the IP of the subrouter, which in this case means pfSense should've added a route to its routes table to associate the delegated prefix with the IP it leased to the dhclient instance. I was surprised to see there was no such route.

After some searching, I found the code in pfSense that creates the routes in response to prefix delegation requests at /usr/local/sbin/prefixes.php. It parses the lease out of /var/dhcpd/var/db/dhcpd6.leases and munges it into a /sbin/route add command. Specifically, if both an ia-na section and an ia-pd section are found for the same DUID, then the link address from the ia-na section and the routed prefix from the ia-pd section are used for the route command.

When I checked the leases file, I saw an ia-pd section but no ia-na section. It made sense - the subrouter was using SLAAC after all. In a way, pfSense's implementation makes sense. Without a stateful lease for the subrouter, the upstream router cannot necessarily add a route for it.

So I added a static DHCPv6 mapping for the host's DUID (with the same IP it had derived for itself for SLAAC to avoid having to change other things), and switched the RA daemon to use "Managed" mode. I also noticed that the DUID used by dhclient was different from the DUID used by systemd-networkd, so I edited the /var/lib/dhcp6/dhclient.leases file to have the same DUID as systemd-networkd's. The DUID that systemd-networkd uses is deterministic (based on the /etc/machine-id) so it would not need anything special to remain in sync. After restarting the network on the enp4s0 interface, I saw the managed mode take effect. I restarted the dhclient process and it acquired the lease again, but there was still no ia-na section in the dhcpd6.leases file, and there was still no route added for the delegated prefix.

In hindsight, it was obvious why there is no ia-na section in the leases file, because the IP was given via static assignment. But in this case pfSense could've preloaded the DUID-to-link-address map from the static DHCPv6 mappings instead of requiring the ia-na section. I may file a bug for this later.

At any rate, it looked I would have to use stateful DHCPv6 without static mappings, so I removed the static mapping for the host from the router, then restarted the network and dhclient again. As expected, this time the host did obtain a random address from the DHCP pool, and I did finally see the ia-na section in the leases file. I also saw the route created for the delegated prefix successfully.

But I still wanted a static IPv6 address for the desktop, so I was not happy with this state of affairs. Luckily the desktop's motherboard has two NICs, enp4s0 and enp6s0. So I decided to have enp4s0 continue to use SLAAC so that I could add a DNS entry for its deterministic address, and have enp6s0 be the one to use stateful DHCPv6 with a separate dynamic address. So I found another LAN cable and plugged it into LAN2 (igb1).

That said, all the LAN NICs in the router were bridged, as I'd described at the start of this post, so there is only one instance of DHCPv6 server and RA daemon. I could not have the two LAN NICs behave differently with respect to the DHCPv6 mode.

I had to destroy the bridge and make each of the four LAN NICs into separate subnets, each with their own DHCPv6 server and RA daemon. In hindsight, this was a better design anyway, since it's closer to how the subnetting should be done.

So now my network topology looks like this:

                           +-----+
                           | ISP |
                           +-----+
                              |
                              |
+-----------------------------o----------------------------+
|                         WAN (em0)                        |
|                                                          |
|                      pfSense Router                      |
|                                                          |
| LAN1 (igb0)    LAN2 (igb1)    LAN3 (igb2)    LAN4 (igb3) |
+------o--------------o--------------o--------------o------+
                      |              |              |
                      |              |              |
                      |         +----o----+         |
                      |         | enp0s25 |         |
                      |         |         |         |
                      |         | laptop  |         |
                      |         +---------+         |
                      |                             |
                      |                             |
                  +---o-----------------------------o---+
                  | enp6s0        docker0        enp4s0 |
                  |                                     |
                  |               desktop               |
                  +-------------------------------------+

Each LAN NIC in the router is now its own subnet instead of being bridged. LAN2's DHCPv6 server also has an additional /64 prefix available to delegate, and its RA daemon runs in "Managed" mode.

On the desktop, I added a network file for enp6s0 so that systemd-networkd would manage it:

[Match]
Name=enp6s0

[Network]
DHCP=ipv6

Of course, I also switched dhclient to run on enp6s0 instead of enp4s0. One thing that tripped me up here was that dhclient -P -v enp6s0 still kept renewing the lease for enp4s0. I eventually discovered this is because dhclient renews all the leases it sees in its leases file regardless of which interfaces it was give in its command line. So I also had to manually clear the enp4s0 leases from the /var/lib/dhcp6/dhclient.leases file.

Now that I had a functioning prefix delegation and also a valid route, I tested the Docker containers again. But still the exact same thing happened - the client failed its connect() syscall with EACCES. I now turned to man connect, which says:

EACCES

For UNIX domain sockets, which are identified by pathname: Write permission is denied on the socket file, or search permission is denied for one of the directories in the path prefix. (See also path_resolution(7).)

EACCES, EPERM

The user tried to connect to a broadcast address without having the socket broadcast flag enabled or the connection request failed because of a local firewall rule.

"The connection request failed because of a local firewall rule" is the only one that could apply.

As an aside, openSUSE has both iptables and nftables available, and also defaults to using firewalld for the firewall. However, unlike the upstream firewalld code, the package in openSUSE defaults to using its iptables backend instead of its nftables backend. This is because Docker itself uses iptables, and works by putting its rules ahead of all existing rules. So if firewalld defined its rules using nftables, they would run in addition to Docker's rules and override them. That said, openSUSE also has the iptables-backend-nft package which causes all invocations of iptables to define rules using nftables anyway. Thus both firewalld and Docker end up defining rules that are visible using the nft CLI. This is the configuration I run, since I find the nft CLI easier to use than iptables and ip6tables. For example, flushing all rules with nft is just nft flush ruleset, but needs fourteen commands for iptables

So back to the problem, I decided to flush all the host's routing tables with nft flush ruleset and see what would happen. With an empty routing table, the kernel should not block any packets from being routed to wherever they need to be. Running the test again, it succeeded! The netcat client container was able to connect to the server container and they were able to send TCP messages back and forth. I also tried to connect to the server container from the laptop that was on a different subnet and that also worked, demonstrating that the routing was set up correctly even for hosts in different subnets to talk to each other.

The question was now to figure out which rule was causing the problem. After some cycles of restoring all rules with firewall-cmd --reload and flushing individual tables and chains with nft flush table ... and nft flush chain ..., I eventually realized the issue was with the ip6 filter FORWARD chain. These were the relevant rules in the output of nft -a -n list table ip6 filter:

chain FORWARD { # handle 2
        type filter hook forward priority 0; policy accept;
        # xt_conntrack counter packets 8531 bytes 8223657 accept # handle 12
        iifname "lo" counter packets 0 bytes 0 accept # handle 13
        counter packets 31 bytes 2352 jump FORWARD_direct # handle 15
        counter packets 31 bytes 2352 jump RFC3964_IPv4 # handle 36
        counter packets 31 bytes 2352 jump FORWARD_IN_ZONES # handle 17
        counter packets 0 bytes 0 jump FORWARD_OUT_ZONES # handle 19
        # xt_conntrack counter packets 0 bytes 0 drop # handle 20
        counter packets 0 bytes 0 # xt_REJECT # handle 21
}

chain FORWARD_IN_ZONES { # handle 16
        iifname "enp6s0" counter packets 11 bytes 800 goto FWDI_internal # handle 76
        iifname "docker0" counter packets 20 bytes 1552 goto FWDI_internal # handle 73
        counter packets 0 bytes 0 goto FWDI_public # handle 160
}

chain FWDI_internal { # handle 51
        counter packets 31 bytes 2352 accept # handle 163
        counter packets 0 bytes 0 jump FWDI_internal_pre # handle 57
        counter packets 0 bytes 0 jump FWDI_internal_log # handle 58
        counter packets 0 bytes 0 jump FWDI_internal_deny # handle 59
        counter packets 0 bytes 0 jump FWDI_internal_allow # handle 60
        counter packets 0 bytes 0 jump FWDI_internal_post # handle 61
        meta l4proto 58 counter packets 0 bytes 0 accept # handle 80
}

I had added the enp6s0 and docker0 interfaces to the internal zone in firewalld, which is why they connect to the FWDI_internal chain.

Removing the REJECT rule with nft delete rule ip6 filter FORWARD handle 21 was enough to make the tests work. But the more appropriate way to do this would be to add ACCEPT rules to the FWDI_internal chain since that is specific to the internal zone's interfaces. Indeed, running:

nft add rule ip6 filter FWDI_internal meta l4proto 6 counter packets 0 bytes 0 accept
nft add rule ip6 filter FWDI_internal meta l4proto 17 counter packets 0 bytes 0 accept

... was also sufficient to make the tests work. (The existing rule for protocol 58 was for IPv6-ICMP. The rules I added were for protocol 6 which is TCP and 17 which is UDP.)

I initially considered putting these commands in a script and putting in the docker.service systemd service's ExecStartPost. But this would be brittle since any reload of the firewall would break Docker until Docker itself was restarted. A better way was to have firewalld add the rule, by adding a "direct" rule:

firewall-cmd --permanent --direct --add-rule ipv6 filter FWDI_internal 99 -j ACCEPT -p all

# equivalent to having firewalld run   ip6tables -A -j ACCEPT -p all   against the FWDI_internal chain.

I verified the tests still ran. The last step was to automate the dhclient command to run every time Docker started. For this I made a separate docker-dhcp-pd.service service:

[Unit]
Description=DHCPv6-PD for Docker
After=network.target

[Service]
Type=forking
ExecStart=/sbin/dhclient -P -v -pf /var/run/dhclient-enp6s0.pid enp6s0
PIDFile=/var/run/dhclient-enp6s0.pid
Restart=always
RestartSec=5s

[Install]
WantedBy=default.target

Notice that it does not have the -d flag so that it does fork and run as a background daemon. To make systemd aware of it, it's necessary to set Type=Forking. Lastly, I gave it a unique PID file so that it wouldn't conflict with other instances for different instances if I ever needed to run them.

Finally, I used systemctl edit docker to make it depend on the new service:

[Unit]
Requires=docker-dhcp-pd.service
After=docker-dhcp-pd.service

After flushing and reloading the firewall rules, and a clean restart of all the services, I once again ran the tests and they were successful. Two containers on the host were able to connect to each other, the containers were able to connect to the host and vice versa, and the laptop on another subnet was also able to connect to the containers.

One last problem was that containers could not connect to the router's DNS server's IPv6 address. To be precise, they would connect and send their query, but the server would immediately respond with REFUSED. I use pfSense's default DNS server, unbound, and I found it disallows queries from hosts it does not recognize. By default it only allows hosts in the DHCP ranges of each NIC, so it does not include hosts in the NICs' delegated prefixes. I solved this by adding an "Access List" in the Services/DNS Resolver/Access Lists section with a network that covered the entire /48 I had.

Level 4: Turn off IPv4, aka "How to run tayga on pfSense"

This last one is the most idealistic. The idea was to disable IPv4 DHCP on the LAN entirely so that all devices would only get IPv6 addresses. By default, you would think this would mean the devices wouldn't be able to access IPv4-only servers on the internet, but two technologies help with this. Since IPv4 addresses are 32 bits, they easily fit in the lower 32 bits of any IPv6 address. Thus one can take a /96 that isn't being used by an actual address and use it to map IPv4 addresses to IPv6 addresses. The standard prefix reserved for this is 64:ff9b::/96 (though you can also use any /96 that belongs to you and isn't already used by any other subnet).

So you need a DNS server that maps IPv4 addresses to IPv6 addresses under 64:ff9b::/96 and returns synthesized AAAA records with those addresses, which is called DNS64. Then you need a stateful NAT that translates IP packets from IPv6 packets with 64:ff9b::/96 to the underlying IPv4 address, which is called NAT64. Ideally both of these would run on the router so that none of the other devices would need an IPv4 address themselves to perform this translation.

Enabling DNS64 on pfSense is straightforward - this document lists the config options to set. For pfSense, these options go in the "Custom options" textarea on the DNS resolver settings page.

server:module-config: "dns64 validator iterator"
server:dns64-prefix: 64:ff9b::/96

After restarting the unbound service, DNS queries started returning these mapped addresses as expected. For example, nslookup ipinfo.io returned both 216.239.36.21 and 64:ff9b::d8ef:2415 and you can see d8ef:2415 indeed corresponds to 216.239.36.21. Note that DNS64 only takes effect for domains that don't already have AAAA records. So nslookup example.org will not return a synthesized AAAA record, only the real one.

Enabling NAT64 was less straightforward. Ideally it would be done by the firewall since doing stateful NAT is already the firewall's job. Unfortunately pfSense uses pf, and FreeBSD's pf does not support NAT64. FreeBSD's pf is forked from OpenBSD's pf, and OpenBSD's pf does have NAT64 support, but the patches apparently cannot be backported to FreeBSD because the two codebases have diverged a lot since the fork. FreeBSD's ipfw firewall does support it, but pfSense does not use it.

The other way to do NAT64 is to use a user-space daemon. A popular software package for this is tayga. There are a few tutorials on the internet for setting up tayga in a Linux VM and configuring pfSense to route the prefixed traffic to the VM which then converts it and route it back. But I wanted it to run on the router because I did not want any other LAN device to have an IPv4 address. As it happens, tayga is in the FreeBSD repository too, so it can be installed after enabling the FreeBSD repo.

# Enable the FreeBSD repo. pkg reads the files in alphabetical order,
# so the override needs to be named such that it comes after the existing files placed by pfSense.
# The z_ prefix does that.
ln -s /etc/pkg/FreeBSD.conf /usr/local/etc/pkg/repos/z_overrides.conf

# Verify that the FreeBSD repo is now enabled.
pkg -vv

# Install the tayga package. This may require updating pkg itself first
# since the one in FreeBSD's repo is newer than the one in pfSense's repo.
pkg install tayga

Setting tayga up needed some more work. OPNSense recently added some support for running tayga as a plugin, so I was able to copy some of the work they did. This GitHub comment and this GitHub comment were very useful, as was the actual implementation of the plugin here.

I configured tayga by editing /usr/local/etc/tayga.conf:

tun-device nat64
ipv4-addr 192.168.255.1
ipv6-addr 2001:db8:0:5::1
prefix 64:ff9b::/96
dynamic-pool 192.168.255.0/24
data-dir /var/db/tayga

All the settings here are default except for ipv6-addr and prefix. ipv6-addr is optional if prefix is not 64:ff9b::/96, but I wanted prefix to be that value. Thus I had to set ipv6-addr to an address that is routed to my router but is not already used. Thus it had to be under the /48 I received from HE but not any of the /64s my LAN NICs were already using.

The next step was to add tayga as a service. I created /usr/local/etc/rc.d/tayga based on the OPNSense script:

#!/bin/sh
#
# $FreeBSD$
#
# PROVIDE: tayga
# REQUIRE: SERVERS
# KEYWORD: shutdown
#

. /etc/rc.subr

name='tayga'

start_cmd='tayga_start'
stop_cmd='tayga_stop'
rcvar='tayga_enable'

load_rc_config 'tayga'
pidfile="/var/run/${name}.pid"
command="/usr/local/sbin/${name}"
command_args="-p ${pidfile}"

[ -z "$tayga_enable" ] && tayga_enable='YES'

tayga_start() {
    "$command" $command_args
    while ! ifconfig 'nat64'; do sleep 1; done
    ifconfig 'nat64' inet '192.168.254.1/32' '192.168.255.1'
    ifconfig 'nat64' inet6 '2001:db8:0:5::1/128'
    route -6 add '64:ff9b::/96' -interface 'nat64'
    route -4 add '192.168.255.0/24' -interface 'nat64'
}

tayga_stop() {
    if [ -n "$rc_pid" ]; then
        echo 'stopping tayga'
        kill -2 "${rc_pid}"
        ifconfig 'nat64' destroy
    else
        echo "${name} is not running."
    fi
}

run_rc_command "$1"

The name of the interface, and the addresses used for the ifconfig 'nat64' inet, ifconfig 'nat64' inet6, route -6 add and route -4 add commands, match the values in tayga.conf. Then I chmod +x'd the file, and ran service tayga start to start it. ifconfig nat64 showed the interface with the IPs attached to it, and the routes were visible in pfSense's web UI.

Then, in the pfSense web UI, the interface assignments page showed the nat64 address as available to be assigned. I did that and it was assigned the default name OPT4. Then I added a firewall rule for the OPT4 interface with action "Pass", address family "IPv4+IPv6", protocol "Any" and "any" source and destination. I also went to Firewall/NAT/Outbound and added a custom mapping for interface "WAN" (the interface corresponding to em0, connected to my ISP), address family "IPv4+IPv6", protocol "any", source "Network" with range 192.168.255.0/24 (the range configured in tayga.conf) and destination "Any". I also switched the "Outbound NAT Mode" setting from "Automatic" to "Hybrid" so that the custom rule would take effect.

I could now see that curl -6 on my desktop was able to fetch IPv4-only hosts.

Unfortunately, there was a problem with setting it up this way. The interface had to be assigned in pfSense so that I could add the firewall and outbound NAT rules for it, but this means the interface is registered in pfSense's config.xml even though it's dynamically generated when the tayga service starts. When I rebooted the router, it noticed that the nat64 interface no longer existed, and went into reconfiguration mode where I would have to set up all the interfaces again. The second GitHub comment mentions this problem too:

I also noticed that this causes issues on reboots. The nat64 interface probably doesn't exist yet when OPNsense configures its interfaces during startup. So saving the interface in the OPNsense config might not be the best choice.

So I needed a way to add the firewall and NAT rules without registering the interface with pfSense, ideally from the tayga service's start action itself. First, I investigated whether it was possible to use the /usr/local/bin/easyrule script that pfSense provides for scripting the firewall rules, but this also requires the interface to be registered with pfSense so it wouldn't have worked. Then I checked whether it would be possible to use pfctl directly. The rules file that pfSense uses is at /tmp/rules.debug, and among other things it contains:

nat-anchor "natrules/*"
anchor "userrules/*"

So it's indeed possible to add rules under those anchors and have them be picked up by pf automatically. I extracted all the rules related to nat64 and 192.168.255.0 from /tmp/rules.debug, ie the rules that pfSense had added when the interface was registered through the UI, and edited the tayga_start function to add those rules using pfctl:

     ifconfig 'nat64' inet6 '2001:db8:0:5::1/128'
     route -6 add '64:ff9b::/96' -interface 'nat64'
     route -4 add '192.168.255.0/24' -interface 'nat64'
+
+    ll_addr="$(ifconfig nat64 | awk 'match($0, /^\tinet6 (fe80:.*)%nat64 /) { print $2; }' | sed -e 's/%nat64$//')"
+    printf "
+scrub on nat64 all fragment reassemble
+block drop in log on ! nat64 inet6 from 2001:db8:0:5::1 to any
+block drop in log on nat64 inet6 from $ll_addr to any
+block drop in log on ! nat64 inet from 192.168.254.1 to any
+pass in quick on nat64 inet all flags S/SA keep state
+pass in quick on nat64 inet6 all flags S/SA keep state
+" | pfctl -a userrules/tayga -f -
+
+    wan_addr="$(ifconfig em0 | awk '/^\tinet / { print $2 }' | head -n1)"
+    if [ -n "$wan_addr" ]; then
+        printf "
+nat on em0 inet from 192.168.255.0/24 to any -> $wan_addr port 1024:65535
+" | pfctl -a natrules/tayga -f -
+    fi

(Note: Every invocation of pfctl for a particular anchor specified by -a deletes any previous rules in the anchor. So if you need to delete the rules, simply run the commands with echo '' instead of printf '...')

The source IP used in the first block rule is the ipv6-addr from tayga.conf.

In the NAT rule, em0 is the IPv4 WAN interface. I'm not sure how to make it so that the rule updates dynamically if the interface changes IPs. For now I'd have to restart the tayga service when that happened.

Then I removed the firewall rule, outbound NAT rule, and finally the whole interface from pfSense. I then restarted the tayga service, and was still able to see the synthesized AAAA records. Lastly, I changed the enp4s0 network file to only request DHCPv6:

 [Match]
 Name=enp4s0
 
 [Network]
-DHCP=yes
+DHCP=ipv6

... and restarted the network. I confirmed that processes on the desktop were able to continue working, as were Docker containers.

I now had a network that was completely IPv6, yet was still able to interoperate with the IPv4 internet.

Alas...

Epilogue

... it turned out a few things I use regularly don't work in a pure IPv6 + NAT64 environment.

One of them is Steam. The precise reason is unknown.

The other is bittorrent. The bittorrent tracker messages include IP addresses inside them, so a firewall cannot rewrite any IPv4 addresses inside the messages with the prefix (unless it used deep packet inspection or was application protocol-aware). Therefore the bittorrent client ends up with IPv4 IPs that it cannot use and becomes unable to find any peers to connect to. It could be possible to have a bittorrent client that lets the user configure it with the prefix, so that it can itself convert IPv4 addresses to IPv6 addresses. But the client I use does not have such a capability and I did not find any other that might.

This is poetic in a way, since regular IPv4-to-IPv4 NAT also has these problems.

As a result, I did unfortunately have to roll back the IPv6-only idealism and let enp4s0 also obtain an IPv4 address. I have left the NAT64 setup running for now, so that applications that don't have a problem working with the NATted IPs can continue doing so.