Finally, I’ve been able to record a video showing how the QoS service plugin works.If you want to deploy this follow the instructions under the video. (open in vimeo for better quality: https://vimeo.com/136295066)

Deployment instructions

Add to your devstack/local.conf

enable_plugin neutron git://git.openstack.org/openstack/neutron
enable_service q-qos

Let stack!

~/devstack/stack.sh

now create rules to allow traffic to the VM port 22 & ICMP

source ~/devstack/accrc/demo/demo

neutron security-group-rule-create  --direction ingress \
                                --port-range-min 22 \
                                --port-range-max 22 \
                                default

neutron security-group-rule-create --protocol icmp \
                                   --direction ingress \
                                   default

nova net-list
nova boot --image cirros-0.3.4-x86_64-uec --flavor m1.tiny \
          --nic net-id=*your-net-id* qos-cirros
#wait....

nova show qos-cirros  # look for the IP
neutron port-list # look for the IP and find your *port id*

In another console, run the packet pusher

ssh cirros@$THE_IP_ADDRESS \
     'dd if=/dev/zero  bs=1M count=1000000000'

In yet another console, look for the port and monitor it

# given a port id 49d4a680-4236-4d0c-9feb-8b4990ac35b9
# look for the ovs port:
$ sudo ovs-vsctl show | grep qvo49d4a680-42
       Port "qvo49d4a680-42"
           Interface "qvo49d4a680-42"

finally, try the QoS rules

source ~/devstack/accrc/admin/admin

neutron qos-policy-create bw-limiter
neutron qos-bandwidth-limit-rule-create *rule-id* bw-limiter \
                        --max-kbps 3000 --max-burst-kbps 300

# after next command, the port will quickly go down to 3Mbps
neutron port-update *your-port-id* --qos-policy bw-limiter

You can change rules in runtime, and ports will be updated

neutron qos-bandwidth-limit-rule-update *rule-id* bw-limiter \
                        --max-kbps 5000 --max-burst-kbps 500

Or you can remove the policy from the port, and traffic will spike up fast to the original maximum.

neutron port-update *your-port-id* --no-qos-policy

Last week we had the Openstack Neutron Quality of Service coding sprint in Ra’‘anana, Israel to work on [1].

It’s been an amazing experience, we’ve acomplished a lot, but we still have a lot ahead.We gathered together at Red Hat office for three days [2], delivering almost (sigh!) the full stack for the QoS service with bandwidth limiting.The first day we had a short meeting where we went over the whole picture of blocks and dependencies that we had to complete.

The people from Huawei India (hi Vikram Choudhary & Ramanjaneya Reddy) helped us remotely by bootstraping the DB models and the neutron client.

Eran Gampel (Huawei), Irena Berezovsky (Midokura) and Mike Kolesnik (Red Hat) revised the API for REST consistency during the first day, provided an amendment to the original spec [12], the API extension and the service plugin [13] Concurrently John Schwarz (Red Hat) was working on the API tests which acted as validation of the work they were doing.

Ihar Hrachyshka (Red Hat) finished the DB models and submited the first neutron versioned objects ever on top of the DB models, I recomend reading those patches, they are like nirvana of coding ;).

Mike Kolesnik plugged the missing callbacks for extending networks and ports. Some of those, extending object reads will be moved to a new neutron.callbacks interface.I mostly worked on coordination and writing some code for the generic RPC callbacks [5] to be used with versioned objects, where I had lots of help from Eran and Moshe Levi (Mellanox), the current version is very basic, not supporting object updates but initial retrieval of the resources, hence not a real callback ;) (yet!).

Eran wrote a pluggable driver backend interface for the service, [6] with a default rpc/messaging backend which fitted very nicely.

Gal Sagie (Huawei) and Moshe Levi worked at the agent level, Gal created the QoS OvS library with the ability to manipulate queues, configure the limits, and attach those queues to ports [7], Moshe leaded the agent design, providing an interface for dynamic agent extensions [8], a QoS agent extension interface [9], and the example for SRIOV [10], Gal then coded the OvS QoS extension driver [11].

During the last day, we tried to put all the pieces together, John was debugging API->SVC->vo->DB (you’d be amazed if you saw him going through vim or ipdb at high speed). Ihar was polishing the models and versioned objects, Mike was polishing the callbacks, and I was tying together the agent side. We were not able to fully assemble a POC in the end, but we were able to interact with neutron client to the server across all the layers. And the agent side was looking good but I managed to destroy the environment I was using, so I will be working on it next week.The plan aheadWe need to assemble the basic POC, make a checklist for missing tests and TODO(QoS), and start enforcing full testing for any other non-poc-essential patch.Doing it as I write: https://etherpad.openstack.org/p/neutron-qos-testing-gapsOnce that’s done we may be ready to merge back during the end of liberty-2, or the very start of next one: liberty-3. Since QoS is designed as a separate service, most of the pieces won’t be activated unless explicitly installed, which makes it very low risk of breaking anything for anyone not using QoS.

What can be done better

Better coordination (in general), I’m not awesome at that, but I guess I had the whole picture of the service, so that’s what I did.Better coordination with remotes: It’s hard when you have a lot of ongoing local discussions, and very limited time to sprint, I’m looking forward to find formulas to enhance that part.

Notes

In my opinion, the mid-cycle coding sprint was very positive, the ability to meet every day, do fast cross-reviews, and very quickly loop in specific people to specific topics was very productive.I guess remote coding sprints should be very productive too, as long as companies guarantee the ability of people to focus on the specific topic, said that, the face to face part is always very valuable.I was able to learn a lot from all the other participants on specific parts of neutron I wasn’t fully aware of, and by building a service plugin we all got the understanding of a fullstack development, from API request, to database, messaging (or not), agents and how all fits together.

Special thanks Gary Kotton for joining us the first day to understand our plan, and help us later with reviews towards merging patches on the branch.To Livnat Peer, for organizing the event within Red Hat, and making sure we prioritized everything correctly.To Doug Wiegley and Kyle Mestery for helping us with rebases from master to the feature branch to cleanup gate bugs on time.

References:

[1] http://specs.openstack.org/openstack/neutron-specs/specs/liberty/qos-api-extension.html#rest-api-impact

[2] https://www.dropbox.com/sh/0ixsqk4dz092ppv/AAAd2hVFP-vXErKacjAdc90La?dl=0

[3] Versioned objects 1/2: https://review.openstack.org/#/c/197047

[4] Versioned objects 2/2: https://review.openstack.org/#/c/197876/

[5] Generic RPC callbacks: https://review.openstack.org/#/c/190635/

[6] Pluggable driver backend: https://review.openstack.org/#/c/197631/

[7] OVS Low level (ovsdb): https://review.openstack.org/196373

[8] Agent extensions: https://review.openstack.org/#/c/195439/

[9] QoS agent extension : https://review.openstack.org/#/c/195440/

[10] SRIOV agent extension https://review.openstack.org/#/c/195441/

[11] OvS QoS extension: https://review.openstack.org/#/c/197557/

[12] API amendment: https://review.openstack.org/#/c/197004/

[13] SVC and extension amendment: https://review.openstack.org/#/c/197078/

Sometimes you write a piece of code within a context, and such context grows wider and wider, or you simple need all the pieces in one place to make sure it works.

Then, for reviewing, or to work in parallel, it makes sense to split your patch in more logical patchlets. I always need to ask google. So let’s write it down here:

Let’s assume $COMMIT is the commit you want to split (set the commit for edit with the edit action):

git rebase -i $COMMIT^

And this will leave your commit changes in the working tree, but you will be back in the previous commit.

git reset HEAD^

loop:

git add -p # the pieces of code you want to
git commit
git rebase --continue

If you were working with gerrit, make sure that only one of your patches (probably the biggest one) keeps the original change ID, so the change can still be tracked, and old comments will be available.

banana split git joke

(image credits go to: http://www.nicartoons.com/wallpapers/?id=1)

More interesting git stuff (fixup and autosquash): http://fle.github.io/git-tip-keep-your-branch-clean-with-fixup-and-autosquash.html (thanks to Jakub Libosvar!)

Sometimes, you find yourself trying to debug a problem with SE linux, specially during software development, or packaging new software features. I have found this with neutron agents to happen quite often, as new system interactions are developed. Disabling selinux during development is generally a bad idea, because you’ll discover such problems later in time and under higher pressure (release deadlines). Here we show a recipe, from Kashyap Chamarthy, to find out what rules are missing, and generate a possible SELinux policy:

Make sure selinux is enabled

sudo su -
setenforce 1

Clear your audit log, and supposing the problem was in neutron-dhcp-agent, restart it.

 > /var/log/audit/audit.log
systemctl restart neutron-dhcp-agent

Wait for the problem to be reproduced..

Find what you got, and create a reference policy

cat /var/log/audit/audit.log
cat /var/log/audit/audit.log | audit2allow -R

At that point, report a bug so you get those policies incorporated in advance. Give a good description of what’s blocked by the policies, and why does it need to be unblocked. Now you can generate a policy, and install it locally:

You can generate a SELinux loadable module to move on without disabling the whole SELinux:

cat /var/log/audit/audit.log | audit2allow -a -M neutron

And you can also install it in runtime

semodule -i neutron.pp

Restart neutron-dhcp-agent (or re-trigger the problem to make sure it’s fixed)

systemctl restart neutron-dhcp-agent

We found during scalability tests, that the security_group_rules_for_devices RPC, which is transmitted from neutron-server to the neutron L2 agents during port changes, grew exponentially.

So we filled a spec for juno-3, the effort leaded by shihanzhang and me can be tracked here:

  • https://review.openstack.org/#/c/111876/
  • https://review.openstack.org/#/c/115575/

I have written a test and a little -dirty- benchmark (https://review.openstack.org/#/c/115575/1/neutron/tests/unit/test_security_groups_rpc.py line 418) to check the results and make sure the new RPC actually performs better.

Here are the results:

Message size (Y) vs. number of ports (X) graph:

RPC execution time in seconds (Y) vs. number of ports (X):