Categories
Network Opensource

OVN Distributed East/West and L3HA routing on VLAN

HA and Distributed are beautiful words, but complex ones. Within few seconds MAC addresses flap in the switch. It is for a good cause, but this anoys the admin who runs to disable port flapping detection in the switch, then he breaths.

I guess you picture the situation, which I have seen a few times happen. He already need to disable port flapping detection for L3HA to work, so it’s not a big deal.

In the next few pages I’m explaining how OVN/L3 works over VLAN once Anil’s patches are in place, and I review the life of a couple of ICMP packets through the network.

The network

As the diagram shows, the example network is composed of:

  • 4 Chassis:
    • Gateway Node 1
    • Gateway Node 2
    • Compute Node A
    • Compute Node B
  • 3 Physical networks:
    • Interface 1: VLANs provider network: The network for logical switches traffic, each logical switch will have it’s own VLAN ID.
    • Interface 2: The overlay network: although it’s not used for carrying traffic we still rely on BFD monitoring over the overlay network for L3HA purposes (deciding on the master/backup state)
    • Interface 3: Internet/Provider Network: Our external network.
  • 2 Logical Switches (or virtual networks):
    • A with CIDR 20.1.0.0/24 and a localnet port to vlan provider network, tag: 2011
    • B with CIDR 20.0.0.0/24 and a localnet port to vlan provider network, tag: 2010
    • C irrelevant
  • 1 Logical Router:
    • R1 which has three logical router ports:
      • On LS A
      • On LS B
      • On external network

The journey of the packet

In the next lines I’m tracking the small ICMP echo packet on its journey through the different network elements. You can see a detailed route plan in See Packet 1 ovn-trace

The inception

The packet is created inside VM1, where it has a virtual interface with the 20.1.0.11/24 address (fa:16:3e:16:07:92 MAC), and a default route to 20.1.0.1 (fa:16:3e:7e:d6:7e) for anything outside it’s subnet.

On it’s way out VM1

As the packet is handled by br-int OpenFlow rules for the logical router pipeline the source MAC address is replaced with the router logical port on logical switch B, and destination MAC is replaced with the destination port MAC on VM4. Afterwards the destination network VLAN tag for logical switch B attached to the packet.

The physical switch

The packet leaves the virtual switch br-phy, through interface 1, reaching the Top of Rack switch.

The ToR switch CAM table is updated for 2010 fa:16:3e:65:f2:ae which is R1’s leg into virtual network B (logical switch B).

vid + MACportage
2010 fa:16:3e:65:f2:ae10
2011 fa:16:3e:16:07:92112
2010 fa:16:3e:75:ca:89910

Going into VM4

As the packet arrives to the hypervisor, it’s decapsulated from the VLAN tag, and directed to the VM4 tap.

The return

VM4 receives the ICMP request and responds to it with an ICMP echo reply. The new packet is directed to R1’s MAC and VM1’s IP address.

On it’s way out of VM4

As the packet is handled by br-int OpenFlow rules for the logical router pipeline the source MAC address is replaced with the router logical port on logical switch B, and destination MAC is replaced with the destination port MAC on VM4.

Afterwards the destination network VLAN tag for logical switch B attached to the packet.

The physical switch (on it’s way back)

The packet leaves the virtual switch br-phy, through interface 9, reaching the Top of Rack Switch.

The ToR switch CAM table is updated for 2011 fa:16:3e:7e:d6:7e on port 9 which is R1’s leg into virtual network A (logical switch A).

vid + MACportage
2010 fa:16:3e:65:f2:ae11
2011 fa:16:3e:16:07:92112
2011 fa:16:3e:7e:d6:7e90
2010 fa:16:3e:75:ca:89910

The end

By the end of it’s journey, the ICMP packet crosses br-phy, where the OpenFlow rules will decapsulate from localnet port into LS A, and direct the packet to VM1, as the eth.dst patches VM1’s MAC address.

VM1 receives the packet normally, coming from VM4 (20.0.0.10) through our virtual R1 (fa:16:3e:7e:d6:7e).

The end?, oh no

We need to explore the case where we have ongoing communications from VM6 to VM3, and VM1 to VM4. Both cases are East/West traffic communication, that will make the R1 MAC addresses flip in ToR switch CAM table.

Annex

Packet 1 ovn-trace

$ ovn-trace --detailed neutron-0901bce9-c812-4fab-9844-f8ac1cdee066 'inport == "port-net2" && eth.src == fa:16:3e:16:07:92 && eth.dst==fa:16:3e:7e:d6:7e && ip4.src==20.1.0.11 && ip4.dst==20.0.0.10 && ip.ttl==32'
# ip,reg14=0x4,vlan_tci=0x0000,dl_src=fa:16:3e:16:07:92,dl_dst=fa:16:3e:7e:d6:7e,nw_src=20.1.0.11,nw_dst=20.0.0.10,nw_proto=0,nw_tos=0,nw_ecn=0,nw_ttl=32

ingress(dp="net2", inport="port-net2")
--------------------------------------
 0. ls_in_port_sec_l2 (ovn-northd.c:3847): inport == "port-net2" && eth.src == {fa:16:3e:16:07:92}, priority 50, uuid 72657159
    next;
 1. ls_in_port_sec_ip (ovn-northd.c:2627): inport == "port-net2" && eth.src == fa:16:3e:16:07:92 && ip4.src == {20.1.0.11}, priority 90, uuid 2bde621e
    next;
 3. ls_in_pre_acl (ovn-northd.c:2982): ip, priority 100, uuid 6a0c272e
    reg0[0] = 1;
    next;
 5. ls_in_pre_stateful (ovn-northd.c:3109): reg0[0] == 1, priority 100, uuid 00eac4fb
    ct_next;

ct_next(ct_state=est|trk /* default (use --ct to customize) */)
---------------------------------------------------------------
 6. ls_in_acl (ovn-northd.c:3292): !ct.new && ct.est && !ct.rpl && ct_label.blocked == 0 && (inport == "port-net2" && ip4), priority 2002, uuid 25b34866
    next;
16. ls_in_l2_lkup (ovn-northd.c:4220): eth.dst == fa:16:3e:7e:d6:7e, priority 50, uuid 21005439
    outport = "575fb1";
    output;

egress(dp="net2", inport="port-net2", outport="575fb1")
-------------------------------------------------------
 1. ls_out_pre_acl (ovn-northd.c:2938): ip && outport == "575fb1", priority 110, uuid 6d74b82c
    next;
 9. ls_out_port_sec_l2 (ovn-northd.c:4303): outport == "575fb1", priority 50, uuid d022b28d
    output;
    /* output to "575fb1", type "patch" */

ingress(dp="R1", inport="lrp-575fb1")
-------------------------------------
 0. lr_in_admission (ovn-northd.c:4871): eth.dst == fa:16:3e:7e:d6:7e && inport == "lrp-575fb1", priority 50, uuid 010fb48c
    next;
 7. lr_in_ip_routing (ovn-northd.c:4413): ip4.dst == 20.0.0.0/24, priority 49, uuid 4da9c83a
    ip.ttl--;
    reg0 = ip4.dst;
    reg1 = 20.0.0.1;
    eth.src = fa:16:3e:65:f2:ae;
    outport = "lrp-db51e2";
    flags.loopback = 1;
    next;
 8. lr_in_arp_resolve (ovn-northd.c:6010): outport == "lrp-db51e2" && reg0 == 20.0.0.10, priority 100, uuid 89c23f94
    eth.dst = fa:16:3e:76:ca:89;
    next;
10. lr_in_arp_request (ovn-northd.c:6188): 1, priority 0, uuid 94e042b9
    output;

egress(dp="R1", inport="lrp-575fb1", outport="lrp-db51e2")
----------------------------------------------------------
 3. lr_out_delivery (ovn-northd.c:6216): outport == "lrp-db51e2", priority 100, uuid a127ea78
    output;
    /* output to "lrp-db51e2", type "patch" */

ingress(dp="net1", inport="db51e2")
-----------------------------------
 0. ls_in_port_sec_l2 (ovn-northd.c:3829): inport == "db51e2", priority 50, uuid 04b4900d
    next;
 3. ls_in_pre_acl (ovn-northd.c:2885): ip && inport == "db51e2", priority 110, uuid fe072d82
    next;
16. ls_in_l2_lkup (ovn-northd.c:4160): eth.dst == fa:16:3e:76:ca:89, priority 50, uuid 3a1af0d6
    outport = "a0d121";
    output;

egress(dp="net1", inport="db51e2", outport="a0d121")
----------------------------------------------------
 1. ls_out_pre_acl (ovn-northd.c:2933): ip, priority 100, uuid ffea7ed3
    reg0[0] = 1;
    next;
 2. ls_out_pre_stateful (ovn-northd.c:3054): reg0[0] == 1, priority 100, uuid 11c5e570
    ct_next;

ct_next(ct_state=est|trk /* default (use --ct to customize) */)
---------------------------------------------------------------
 4. ls_out_acl (ovn-northd.c:3289): ct.est && ct_label.blocked == 0 && (outport == "a0d121" && ip), priority 2001, uuid f9826b44
    ct_commit(ct_label=0x1/0x1);
[vagrant@hv1 devstack]$

Glossary

  • E/W or East/West : This is the kind of traffic that traverses a router from one subnet to another subnet, going through two legs of a router.
  • N/S or North/South : This kind of traffic flow is very similar to E/W, but it’s a difference that we make, at least in the world of virtual networks when we’re talking of a router that has connectivity to an external network. Traffic that traverses the router into or from an external network. In the case of OVN or OpenStack, it implies the use of DNAT and or SNAT in the router, to translate internal addresses into external addresses and back.
  • L3HA : Highly available L3 service, which eliminates any single point of failure on the routing service of the virtual network.
  • ToR switch : Top of Rack switch, is the switch generally connected on the top of a rack to all the servers in such rack. It provides L2 connectivity.
  • CAM table : CAM means Content Addressable Memory, it’s an specific type of memory that instead of being accessed by address is accessed by “key”, in the case of switches, for the MAC table, it’s accessed by MAC+VLAN ID.
Categories
Network Openstack

Neutron external network with routing (no NAT)

In this blog post I will explain how to connect private tenant networks to an external network without the use of NAT or DNAT (floating ips) via a neutron router.

With the following configuration you will have routers that don’t do NAT on ingress or egress connections. You won’t be able to use floating ips too, at least in the explained configuration, you could add a second external network and a gateway port, plus some static routes to let the router steer traffic over each network.

This can be done, with just one limitation, tenant networks connected in such topology cannot have overlapping ips. And also, the upstream router(s) must be informed about the internal networks so they can add routes themselves to those networks. That can be done manually, or automatically by periodically talking to the openstack API (checking the router interfaces, the subnets etc..) but I’ll skip that part in this blog post.

We’re going to assume that our external provider network is “public”, with the subnet “public_subnet” and that the CIDR of such network is 192.168.1.0/24 with a gateway on 192.168.1.1. Please excuse me because I’m not using the new openstack commands, I can make an updated post later if somebody is interested in that.

Step by step

Create the virtual router

source keystonerc_admin

neutron router-create router_test

Create the private network for our test, and add it to the router

neutron net-create private_test
$PRIV_NET=$(neutron net-list | \
            awk ' /private_test/ { print $2 }')

neutron subnet-create \
       --name private_subnet $PRIV_NET 10.222.0.0/16
neutron router-interface-add router_test private_subnet

Create a security group, and add an instance

neutron security-group-create test
neutron security-group-rule-create test --direction ingress

nova boot --flavor m1.tiny --image cirros \
           --nic net-id=$PRIV_NET \
          --security-group test cirros
sleep 10
nova console-log cirros

Please note that in the console-log you may find that everything went well… and that the instance has an address through DHCP.

Create a port for the router on the external network, do it manually so we can specify the exact address we want.

SUBNET_ID=$(neutron subnet-list | awk ' /public_subnet/ { print $2 }')
neutron port-create public --fixed-ip \
    subnet_id=$SUBNET_ID,ip_address=192.168.1.102 ext-router-port

Add it to the router

PORT_ID=$(neutron port-show ext-router-port | \
          awk '/ id / { print $4 } ')
neutron router-interface-add router_test port=$PORT_ID

We can test it locally (assuming our host has access to 192.168.1.102)

# test local...
ip route add 10.222.0.0/16 via 192.168.1.102

# and assuming that our instance was created with 10.222.0.3

[root@server ~(keystone_admin)]# ping 10.222.0.3
PING 10.222.0.3 (10.222.0.3) 56(84) bytes of data.
64 bytes from 10.222.0.3: icmp_seq=1 ttl=63 time=0.621 ms
64 bytes from 10.222.0.3: icmp_seq=2 ttl=63 time=0.298 ms

Extra note, if you want to avoid issues with overlapping tenant network IPs, I recommend you to have a look at the subnet pool functionality of neutron which started in Kilo.

I hope you enjoyed it, and that this is helpful.

Categories
Network Openstack

Creating a network interface to tenant network in baremetal node (neutron)

The next example illustrates how to create a test port on a bare-metal node connected to an specific tenant network, this can be useful for testing, or connecting specific bare-metal services to tenant networks.

The bare-metal node $HOST_ID needs to run the neutron-openvswitch-agent, which will wire the port into the right tag-id, tell neutron-server that our port is ACTIVE, and setup proper L2 connectivity (ext VLAN, tunnels, etc..)

Setup our host id and tenant network

NETWORK_NAME=private
NETWORK_ID=$(openstack network show -f value -c id $NETWORK_NAME)
HOST_ID=$(hostname)

Create the port in neutron

PORT_ID=$(neutron port-create --device-owner \  
          compute:container \
          --name test_interf0 $NETWORK_ID | awk '/ id / { print $4 }')

PORT_MAC=$(neutron port-show $PORT_ID -f value -c mac_address)

The port is not bound yet, so it will be in DOWN status

neutron port-show $PORT_ID -f value -c status
DOWN

Create the test_interf0 interface, wired to our new port

ovs-vsctl -- --may-exist add-port br-int test_interf0 \
  -- set Interface test_interf0 type=internal \
  -- set Interface test_interf0 external-ids:iface-status=active \
  -- set Interface test_interf0 external-ids:attached-mac="$PORT_MAC" \
  -- set Interface test_interf0 external-ids:iface-id="$PORT_ID"

We can now see how neutron marked this port as ACTIVE

neutron port-show $PORT_ID -f value -c status
ACTIVE

Set MAC address and move the interface into a namespace (namespace is important if you’re using dhclient, otherwise the host-wide routes and DNS configuration of the host would be changed, you can omit the netns if you’re setting the IP address manually)

ip link set dev test_interf0 address $PORT_MAC
ip netns add test-ns
ip link set test_interf0 netns test-ns
ip netns exec test-ns ip link set dev test_interf0 up

Get IP configuration via DHCP

ip netns exec test-ns dhclient -I test_interf0 --no-pid test_interf0 -v
Internet Systems Consortium DHCP Client 4.2.5
Copyright 2004-2013 Internet Systems Consortium.
All rights reserved.
For info, please visit https://www.isc.org/software/dhcp/

Listening on LPF/test_interf0/fa:16:3e:6f:64:46
Sending on   LPF/test_interf0/fa:16:3e:6f:64:46
Sending on   Socket/fallback
DHCPREQUEST on test_interf0 to 255.255.255.255 port 67 (xid=0x5b6ddebc)
DHCPACK from 192.168.125.14 (xid=0x5b6ddebc)

Test connectivity (assuming we have DNS and a router for this subnet)

ip netns exec test-ns ping www.google.com
PING www.google.com (173.194.70.99) 56(84) bytes of data.
64 bytes from fa-in-f99.1e100.net (173.194.70.99): icmp_seq=1 ttl=36 time=115 ms
64 bytes from fa-in-f99.1e100.net (173.194.70.99): icmp_seq=2 ttl=36 time=114 ms
...
Categories
Network Opensource Openstack

Neutron QoS service plugin

Finally, I’ve been able to record a video showing how the QoS service plugin works.If you want to deploy this follow the instructions under the video. (open in vimeo for better quality)

Deployment instructions

Add to your devstack/local.conf

enable_plugin neutron git://git.openstack.org/openstack/neutron
enable_service q-qos

Let stack!

~/devstack/stack.sh

now create rules to allow traffic to the VM port 22 & ICMP

source ~/devstack/accrc/demo/demo

neutron security-group-rule-create  --direction ingress \
                                --port-range-min 22     \
                                --port-range-max 22     \
                                default

neutron security-group-rule-create --protocol icmp     \
                                   --direction ingress \
                                   default

nova net-list
nova boot --image cirros-0.3.4-x86_64-uec \
          --flavor m1.tiny \
          --nic net-id=*your-net-id* qos-cirros
#wait....

nova show qos-cirros  # look for the IP
neutron port-list # look for the IP and find your *port id*

In another console, run the packet pusher

ssh cirros@$THE_IP_ADDRESS \
     'dd if=/dev/zero  bs=1M count=1000000000'

In yet another console, look for the port and monitor it

# given a port id 49d4a680-4236-4d0c-9feb-8b4990ac35b9
# look for the ovs port:
$ sudo ovs-vsctl show | grep qvo49d4a680-42
       Port "qvo49d4a680-42"
           Interface "qvo49d4a680-42"

finally, try the QoS rules

source ~/devstack/accrc/admin/admin

neutron qos-policy-create bw-limiter
neutron qos-bandwidth-limit-rule-create *rule-id* \
        bw-limiter \
        --max-kbps 3000 --max-burst-kbps 300

# after next command, the port will quickly
# go down to 3Mbps
neutron port-update *your-port-id* --qos-policy bw-limiter

You can change rules in runtime, and ports will be updated

neutron qos-bandwidth-limit-rule-update *rule-id* \
        bw-limiter \
        --max-kbps 5000 --max-burst-kbps 500

Or you can remove the policy from the port, and traffic will spike up fast to the original maximum.

neutron port-update *your-port-id* --no-qos-policy