This post will share some tips in developing changes for the submariner gateway, and testing those in a live environment. While the development environment can create kind based clusters in your laptop, it’s sometimes necessary to test the integration with specific clouds as you develop.
Please bear in mind that some of the details are specific to the development branch during the 0.9 cycle of submariner, but the imageOverride technique will be valid.
I normally use a couple of scripts, one for building and pushing my images, another one for deploying the images.
This is the builder:
rm package/.image.submariner bin/linux/amd64/submariner-engine || true
make bin/linux/amd64/submariner-engine package/.image.submariner
docker tag quay.io/submariner/submariner:devel quay.io/mangelajo/submariner:devel
docker push quay.io/mangelajo/submariner:devel
This deployer modifies the submariner definition handled by the submariner-operator, and overrides the submariner (gateway) image to quay.io/mangelajo/submariner:devel which is the image I build I push on the previous script.
Then it kills the existing pods for the submariner gateway, which will force the reload.
Please note that I’m using the “devel” version, which will make the operator enable the ImagePullPolicy as Always, re-checking the remote repo for new images and ignore the local cache everytime a pod is started.
HA and Distributed are beautiful words, but complex ones. Within few seconds MAC addresses flap in the switch. It is for a good cause, but this anoys the admin who runs to disable port flapping detection in the switch, then he breaths.
I guess you picture the situation, which I have seen a few times happen. He already need to disable port flapping detection for L3HA to work, so it’s not a big deal.
In the next few pages I’m explaining how OVN/L3 works over VLAN once Anil’s patches are in place, and I review the life of a couple of ICMP packets through the network.
As the diagram shows, the example network is composed of:
Gateway Node 1
Gateway Node 2
Compute Node A
Compute Node B
3 Physical networks:
Interface 1: VLANs provider network: The network for logical switches traffic, each logical switch will have it’s own VLAN ID.
Interface 2: The overlay network: although it’s not used for carrying traffic we still rely on BFD monitoring over the overlay network for L3HA purposes (deciding on the master/backup state)
A with CIDR 188.8.131.52/24 and a localnet port to vlan provider network, tag: 2011
B with CIDR 184.108.40.206/24 and a localnet port to vlan provider network, tag: 2010
1 Logical Router:
R1 which has three logical router ports:
On LS A
On LS B
On external network
The journey of the packet
In the next lines I’m tracking the small ICMP echo packet on its journey through the different network elements. You can see a detailed route plan in See Packet 1 ovn-trace
The packet is created inside VM1, where it has a virtual interface with the 220.127.116.11/24 address (fa:16:3e:16:07:92 MAC), and a default route to 18.104.22.168 (fa:16:3e:7e:d6:7e) for anything outside it’s subnet.
On it’s way out VM1
As the packet is handled by br-int OpenFlow rules for the logical router pipeline the source MAC address is replaced with the router logical port on logical switch B, and destination MAC is replaced with the destination port MAC on VM4. Afterwards the destination network VLAN tag for logical switch B attached to the packet.
The physical switch
The packet leaves the virtual switch br-phy, through interface 1, reaching the Top of Rack switch.
The ToR switch CAM table is updated for 2010 fa:16:3e:65:f2:ae which is R1’s leg into virtual network B (logical switch B).
vid + MAC
Going into VM4
As the packet arrives to the hypervisor, it’s decapsulated from the VLAN tag, and directed to the VM4 tap.
VM4 receives the ICMP request and responds to it with an ICMP echo reply. The new packet is directed to R1’s MAC and VM1’s IP address.
On it’s way out of VM4
As the packet is handled by br-int OpenFlow rules for the logical router pipeline the source MAC address is replaced with the router logical port on logical switch B, and destination MAC is replaced with the destination port MAC on VM4.
Afterwards the destination network VLAN tag for logical switch B attached to the packet.
The physical switch (on it’s way back)
The packet leaves the virtual switch br-phy, through interface 9, reaching the Top of Rack Switch.
The ToR switch CAM table is updated for 2011 fa:16:3e:7e:d6:7e on port 9 which is R1’s leg into virtual network A (logical switch A).
vid + MAC
By the end of it’s journey, the ICMP packet crosses br-phy, where the OpenFlow rules will decapsulate from localnet port into LS A, and direct the packet to VM1, as the eth.dst patches VM1’s MAC address.
VM1 receives the packet normally, coming from VM4 (22.214.171.124) through our virtual R1 (fa:16:3e:7e:d6:7e).
The end?, oh no
We need to explore the case where we have ongoing communications from VM6 to VM3, and VM1 to VM4. Both cases are East/West traffic communication, that will make the R1 MAC addresses flip in ToR switch CAM table.
E/W or East/West : This is the kind of traffic that traverses a router from one subnet to another subnet, going through two legs of a router.
N/S or North/South : This kind of traffic flow is very similar to E/W, but it’s a difference that we make, at least in the world of virtual networks when we’re talking of a router that has connectivity to an external network. Traffic that traverses the router into or from an external network. In the case of OVN or OpenStack, it implies the use of DNAT and or SNAT in the router, to translate internal addresses into external addresses and back.
L3HA : Highly available L3 service, which eliminates any single point of failure on the routing service of the virtual network.
ToR switch : Top of Rack switch, is the switch generally connected on the top of a rack to all the servers in such rack. It provides L2 connectivity.
CAM table : CAM means Content Addressable Memory, it’s an specific type of memory that instead of being accessed by address is accessed by “key”, in the case of switches, for the MAC table, it’s accessed by MAC+VLAN ID.
One of the simplest and less memory hungry ways to deploy a tiny networking-ovn all in one it’s still to use packstack.
Note This is an unsupported way of deployment (for the product version), but it must be just fine to give it a try. Afterwards, if you want to get serious and try something closer to production, please have a look af the TripleO deployment guide
In this blog post I will explain how to connect private tenant networks to an external network without the use of NAT or DNAT (floating ips) via a neutron router.
With the following configuration you will have routers that don’t do NAT on ingress or egress connections. You won’t be able to use floating ips too, at least in the explained configuration, you could add a second external network and a gateway port, plus some static routes to let the router steer traffic over each network.
This can be done, with just one limitation, tenant networks connected in such topology cannot have overlapping ips. And also, the upstream router(s) must be informed about the internal networks so they can add routes themselves to those networks. That can be done manually, or automatically by periodically talking to the openstack API (checking the router interfaces, the subnets etc..) but I’ll skip that part in this blog post.
We’re going to assume that our external provider network is “public”, with the subnet “public_subnet” and that the CIDR of such network is 192.168.1.0/24 with a gateway on 192.168.1.1. Please excuse me because I’m not using the new openstack commands, I can make an updated post later if somebody is interested in that.
We can test it locally (assuming our host has access to 192.168.1.102)
# test local...
ip route add 10.222.0.0/16 via 192.168.1.102
# and assuming that our instance was created with 10.222.0.3
[root@server ~(keystone_admin)]# ping 10.222.0.3
PING 10.222.0.3 (10.222.0.3) 56(84) bytes of data.
64 bytes from 10.222.0.3: icmp_seq=1 ttl=63 time=0.621 ms
64 bytes from 10.222.0.3: icmp_seq=2 ttl=63 time=0.298 ms