Monday, November 13, 2023

Writing an Envoy Filter like a mere mortal (not a Ninja)

This article, like its predecessor Quickly Building Envoy Proxy, is an attempt to document what should have been widely documented but isn't. Serious open source communities sometimes function in an elitist way to perhaps keep the entry bar high. Maybe that's why they consciously avoid being too focused on documenting the basic mechanisms that people need in order to work with the code base. But for commoners like me this lack of documentation becomes motivation to figure things out and write about them with the hope that someone else finds it easier.

If you're building Envoy from source code, then maybe you've got reason to modify Envoy source code, and one of the most common things that people need to do with the Envoy code base is to write new filters.

What are filters?

You possibly already know this, but filters are pluggable code that you can run within Envoy to customize how you process incoming requests, what responses you send, etc. When a request enters Envoy, it enters through a listener (a listener port). This request then goes through a filter chain - which might be chosen from among multiple filter chains based on some criteria - like the source or destination address or port. Each filter chain is a sequence of filters - logic that runs on the incoming request and perhaps modifies it or determines whether it would be processed further and how.

Filters can be written in C++, Lua, Wasm (and therefore any language that compiles to Wasm), and apparently in Go and Rust too. I know precious little about the last ones but they sound interesting. Lua filters are quite limited in many ways and hence I decided not to focus on them. Native filters in C++ seem to be adequate for most purposes so this post is about them.

What are the different types of filters?

There are different kinds of filters. Listener filters are useful if some actions need to be performed while accepting connections. Then there are network filters which operate at the L4 layer on requests and responses. Two such filters are HTTP Connection Manager (HCM) which is used to process http traffic, and TCP Proxy, which is used to route generic TCP traffic. Within HCM, it is possible to load a further class of filters which are aware of the http protocol - these are called HTTP filters and operate at the L7 layer. In this article, we will focus on L4 layer filters or network filters.

Where does one write filters?

Filter writing seems to have gotten a wee bit easier over the successive Envoy versions and seemed somewhat agreeable when I tried it on version 1.27.x. One can write a filter in a separate repo and include all of the Envoy code as a sub-module. However, I wanted to write an in-tree filter - just like the several filters already part of the code base.

How to write your first filter?

We will write a network filter which will log the destination IP of a request as it was received, and then forward the request to some destination based on a mapping of destination IPs to actual target addresses. This is actually useful functionality which isn't available out of the box from Envoy, but requires just a small amount of filter code to get going. We will call this filter the Address Mapper filter. So what do we need?

The config proto

Most filters need some configuration. In our case, our filter would take a map of IP addresses to cluster names - Envoy clusters representing one or more endpoints where the traffic could be sent. So essentially, we are looking for a map<string, string> as input to the filter. However, to make things a little bit more type-safe, Envoy needs to know exactly what is the type of the input config. So we must define a protobuf message describing this input config. We create this under api/envoy/extensions/filters/network/address_mapper/v3. The filter would be a network filter, and it would be called address_mapper. So we created a directory under api/envoy/extensions/filters/network, by the name address_mapper. One further sub-directory under it, v3, holds the actual protos. v3 represents the current generation of Envoy's config API - v1 and v2 are obsolete versions. The proto file, address_mapper.proto, is placed under v3 and has the following content.

syntax = "proto3";

package envoy.extensions.filters.network.address_mapper.v3;

import "udpa/annotations/status.proto";

option java_package = "io.envoyproxy.envoy.extensions.filters.network.address_mapper.v3";
option java_outer_classname = "AddressMapper";
option java_multiple_files = true;
option go_package = "github.com/envoyproxy/go-control-plane/envoy/extensions/filters/network/address_mapper/v3;address_mapperv3";
option (udpa.annotations.file_status).package_version_status = ACTIVE;

// [#protodoc-title: Address mapper]
// Connection limit :ref:`configuration overview <config_network_filters_connection_limit>
// [#extension: envoy.filters.network.address_mapper]

message AddressMapper {
  // address_map is expected to contain a 1:1 mapping of
  // IP addresses to other IP addresses or FQDNs.
  map<string, string> address_map = 1;
}

We must also create a Bazel BUILD file in the same directory, and that's the limit of what I am qualified to say about these abominations used to build the whole Envoy binary and its various parts. So

# DO NOT EDIT. This file is generated by tools/proto_format/proto_sync.py.

load("@envoy_api//bazel:api_build_system.bzl", "api_proto_package")

licenses(["notice"])  # Apache 2

api_proto_package(
    deps = ["@com_github_cncf_udpa//udpa/annotations:pkg"],
)

If at this time you want to interject profanities about Bazel (or at any other time), you know it is wrong.

Anyway, so you need to link your proto up to the build chain. So you have to make entries inside api/BUILD and api/versioning/BUILD. Make the following entry under the v3_protos library in api/BUILD, and under active_protos in api/versioning/BUILD.

"//envoy/extensions/filters/network/address_mapper/v3:pkg",

We must also create a type URL that Envoy would recognize and instantiate the config message of the correct type. To do this we create an entry for the AddressMapper message inside source/extensions/extensions_metadata.yaml.

envoy.filters.network.address_mapper:
  categories:
  - envoy.filters.network
  security_posture: robust_to_untrusted_downstream_and_upstream
  status: stable
  type_urls:
  - envoy.extensions.filters.network.address_mapper.v3.AddressMapper

This introduces the new filter, and a type URL for the config proto on the last line. We must also tell Bazel where the source code for the new filter is present. To do this we edit source/extensions/extensions_build_config.bzl creating the following entry in the network filters section:

"envoy.filters.network.address_mapper":                       "//source/extensions/filters/network/address_mapper:config",

Envoy must also recognize the fully-qualified string representing the new network filter we are going to create. Because it is a network filter, we add it in source/extensions/filters/network/well_known_names.h. Inside the class NetworkFilterNameValues, we add the following const member.

// Address mapper filter
const std::string AddressMapper = "envoy.filters.network.address_mapper";

The filter logic

We must add the filter logic somewhere. To do this, we create a new directory called address_mapper under source/extensions/filters/network/address_mapper/. We first add the AddressMapperFilter filter class definition in address_mapper.h and also an AddressMapperConfig class which wraps the config message passed via the Envoy config. These are all inside the Envoy::Extensions::NetworkFilters::AddressMapper namespace.

class AddressMapperConfig {
public:
  AddressMapperConfig(const FilterConfig& proto_config);

  absl::string_view getMappedAddress(const absl::string_view& addr) const;

private:
  absl::flat_hash_map<std::string, std::string> addr_map_;
};

The filter takes a shared_ptr to the above config class.

using AddressMapperConfigPtr = std::shared_ptr<AddressMapperConfig>;

class AddressMapperFilter : public Network::ReadFilter, Logger::Loggable<Logger::Id::filter> {
public:
  AddressMapperFilter(AddressMapperConfigPtr config);

  // Network::ReadFilter
  Network::FilterStatus onData(Buffer::Instance&, bool) override {
    return Network::FilterStatus::Continue;
  }

  Network::FilterStatus onNewConnection() override;

  void initializeReadFilterCallbacks(
          Network::ReadFilterCallbacks& callbacks) override {
    read_callbacks_ = &callbacks;
  }

private:
  Network::ReadFilterCallbacks* read_callbacks_{};
  AddressMapperConfigPtr config_;
};

The implementation of the onNewConnection method is in the address_mapper.cc file. For example, we can get the original destination address like this.

Network::Address::InstanceConstSharedPtr dest_addr =
      Network::Utility::getOriginalDst(const_cast<Network::Socket&>(read_callbacks_->socket()));

We can then map this address to the target cluster, etc.

Someone has to instantiate this filter and pass it the correct type of argument (AddressMapperConfigPtr). That responsibility falls with the glue code or filter factory, which we look at next.

The glue code

We define the config factory (AddressMapperConfigFactory) class inside the config.h header in the filter directory. These are all inside the Envoy::Extensions::NetworkFilters::AddressMapper namespace.

class AddressMapperConfigFactory
    : public Common::FactoryBase<
       envoy::extensions::filters::network::address_mapper::v3::AddressMapper> {
public:
  AddressMapperConfigFactory() : FactoryBase(NetworkFilterNames::get().AddressMapper) {}

  /* ProtobufTypes::MessagePtr createEmptyConfigProto() override; */
  std::string name() const override { return NetworkFilterNames::get().AddressMapper; }

private:
  Network::FilterFactoryCb createFilterFactoryFromProtoTyped(
      const envoy::extensions::filters::network::address_mapper::v3::AddressMapper& proto_config,
      Server::Configuration::FactoryContext&) override;
};

We now add the implementation for createFilterFactoryFromProtoTyped, which is the entry point for filter instantiation.

Network::FilterFactoryCb AddressMapperConfigFactory::createFilterFactoryFromProtoTyped(
    const envoy::extensions::filters::network::address_mapper::v3::AddressMapper& proto_config,
    Server::Configuration::FactoryContext&) {

  AddressMapperConfigPtr filter_config = std::make_shared<AddressMapperConfig>(proto_config);
  return [filter_config](Network::FilterManager& filter_manager) -> void {
    filter_manager.addReadFilter(std::make_shared<AddressMapperFilter>(filter_config));
  };  
}
Given the protobuf input from the configuration, this code gives back an instance of the actual filter initialized with this config.

How to compile your first filter?

You need to ensure that your filter code is included in the BUILD. Create the BUILD file in your filter directory with the following content.

load(
    "//bazel:envoy_build_system.bzl",
    "envoy_cc_extension",
    "envoy_cc_library",
    "envoy_extension_package",
)

licenses(["notice"])  # Apache 2

envoy_extension_package()

envoy_cc_library(
    name = "address_mapper",
    srcs = ["address_mapper.cc"],
    hdrs = ["address_mapper.h"],
    deps = [
        "//envoy/network:connection_interface",
        "//envoy/network:filter_interface",
        "//source/common/common:assert_lib",
        "//source/common/common:minimal_logger_lib",
        "//source/common/tcp_proxy",
        "//source/common/protobuf:utility_lib",
        "//source/common/network:utility_lib",
        "@envoy_api//envoy/extensions/filters/network/address_mapper/v3:pkg_cc_proto",
    ],
    alwayslink = 1,
)

envoy_cc_extension(
    name = "config",
    srcs = ["config.cc"],
    hdrs = ["config.h"],
    deps = [
        ":address_mapper",
        "//envoy/registry",
        "//envoy/server:filter_config_interface",
        "//source/extensions/filters/network/common:factory_base_lib",
        "//source/extensions/filters/network:well_known_names",
        "@envoy_api//envoy/extensions/filters/network/address_mapper/v3:pkg_cc_proto",
    ],
)

The exact dependencies listed depend on what you need to call from within your filter code (something we haven't yet shown). For example, the protobuf utility_lib or network utility_lib are listed, as is the network connection_interface.

The previous article in this series already shows how to build Envoy. That is all you need to do to build Envoy with this filter enabled. One handy ability is to build Envoy with debug symbols. This is quite easy:

bazel build envoy -c debug

The binary is created under ./bazel-bin/source/exe/envoy-static.

Configuring Envoy to run your filter

In our case, we want the filter to be put before a TCP Proxy filter. So the config should look like this:

           - name: envoy.filters.network.address_mapper
              typed_config:
                "@type": type.googleapis.com/envoy.extensions.filters.network.address_mapper.v3.AddressMapper
                address_map:
                  "169.254.1.2": "cluster_1"
                  "169.254.1.3": "cluster_2"
            - name: envoy.filters.network.tcp_proxy
              typed_config:
                ...

The assumption is that the clusters cluster_1 and cluster_2 are separately defined elsewhere in the config. Our filter checks if the original destination IP of the incoming request matches the IPs listed in the address map and if it does, then it sets a connection streamInfo filter-state metadata (TcpProxy::PerConnectionCluster) that tells the ensuing TCP proxy filter to forward the request to the mapped cluster.

Conclusion

There are lots of gaps in this article (because it was hurriedly written), but refer to existing filter code to fill those gaps in. It should be fairly straightforward.



Read more!

Monday, October 02, 2023

Quickly Building Envoy Proxy

Building Envoy isn't all that hard. We have to use Bazel / Bazelisk for the process. Here are the steps summarized for quick reference:

cd ~/Downloads
wget https://github.com/bazelbuild/bazelisk/releases/latest/download/bazelisk-linux-amd64:
sudo mv ~/Downloads/bazelisk-linux-amd64 /usr/local/bin/bazel
sudo chmod +x /usr/local/bin/bazel

Install / upgrade some local packages:
sudo apt install autoconf libtool curl patch python3-pip unzip virtualenv

Download Envoy source code:
mkdir -p github.com/envoyproxy
git clone https://amukherj@github.com/envoyproxy/envoy

Download and install clang+llvm:
wget https://github.com/llvm/llvm-project/releases/download/llvmorg-16.0.0/clang+llvm-16.0.0-x86_64-linux-gnu-ubuntu-18.04.tar.xz
tar xf -C tools clang+llvm-16.0.0-x86_64-linux-gnu-ubuntu-18.04.tar.xz
ln -s clang+llvm-16.0.0-x86_64-linux-gnu-ubuntu-18.04 ~/devel/tools/clang+llvm

Install additional go utilities:
go install github.com/bazelbuild/buildtools/buildifier@latest
export BUILDIFIER_BIN=/home/amukher1/devel/go/bin/buildifier
go install github.com/bazelbuild/buildtools/buildozer@latest
export BUILDOZER_BIN=/home/amukher1/devel/go/bin/buildozer

Build the code. This step can take well over an hour, depending on your machine resources.
bazel build envoy


Read more!

Monday, September 18, 2023

My Driving Principles

The other day, I was thinking of things I heard or read, that left a lasting impression on me and changed how I approached life. In most cases, I remember when I heard or read them, who said it or where it was written, and in some cases, why it had an impression on me.

  1. Pride comes before a fall. Aka, don't be a narcissist.
  2. We don't deserve all the credit for our successes, nor all the blame for our failures. So judge kindly.
  3. Doing the same thing over and over again expecting different results is tantamount to insanity.
  4. Patience is godliness.
  5. The difference between a job done well and a shoddy job is often a only little extra time, effort, and care.
  6. The wise resist pleasure while the fool become its slave. Without moderation the greatest pleasures fade.
  7. Success comes to those who do what they must, even if they don't feel like doing it. Every single day.
  8. You never forget what you learn by teaching others.
  9. Always learn from your mistakes, but try as much to learn from that of others.
  10. To live is to learn every single day.
  11. You can't always repay kindness, but pay it forward to someone else.
  12. Being able to accommodate someone's imperfection and not making them feel bad about it is a great virtue. (I can't always practice this in my inner circle and this has been the most challenging principle for me.)
There might be a few others, but these have pretty much the principles that have defined and shaped how I think. In most respects, I am a work-in-progress in the light of these principles - but they give me direction.

Read more!

Sunday, September 17, 2023

Go versioning and modules

Go has evolved into the de facto language for building control plane and management plane services, and along the way the Go tooling has picked up quite a few semantics around versioning and dependency management. This is my attempt at a round up of it all, leaving the details to the fantastic articles linked at the end of this post.

Package versions

If you maintain a Go package on a git repo that other Go programs import, then it is strongly recommended to have a notion of versioning and release defined for your package and repo.

  • The standard practice is to use semver (semantic versioning) of the form <MAJOR_VER>.<MINOR_VER>.<PATCH_VER>.
  • An optional pre-release version tag can be suffixed using a hyphen, such as 1.2.0-alpha or 1.2.9-beta.2. A pre-release version is considered an unstable version.
  • Major versions indicate API generations.
    • Major version 0 indicates evolution and potentially unstable interfaces and implementations.
    • Major version 1 indicates that the API has stabilized, although additional interfaces could be added.
    • Further major version updates are mandated if and only if there are breaking changes in the API.
  • Minor versions indicate API and implementation progression that maintain backward compatibility.
  • Patch versions indicate bug fixes and improvements without API changes.
A point to note is that major version 0 is treated a bit differently. You could have breaking changes between 0.1 and 0.2, for example, and backward compatible API changes between 0.1.1 and 0.1.2. This is unlike say 1.1 and 1.2, which cannot have breaking changes between them, and 1.1.1 and 1.1.2 which cannot have API changes (even backward-compatible ones) between them.

For reasons that would become clear shortly, it is important to tag specific commits with these versions in the repo, using git tags. This allows the git repo containing your Go code to participate in versioned dependency management that go tools support. The convention for tags is the letter v followed by the version string.

Modules and packages

Modules are now the standard way for doing dependency management in Go. A module is just a directory structure with Go source code under it, and a go.mod file at the root of the directory structure acting as a manifest. The go.mod file contains the module path name that would be used to address the module, and the Go version used to generate the module. In addition, it contains a list of external packages that the code inside the module depends on, and their versions. Three things are important:
  • Code inside and outside the module should import packages inside the module using the module path and the path of the package within the module relative to the module root.
  • Go tooling automatically determines a module version to use for each package dependency.
    • If available, the latest tagged stable version is used. Here stable refers
    • If not, then the latest pre-release (i.e. unstable) version is used.
    • If not, then the latest untagged version is used. A pseudo-version string is generated for this of the form v0.0.0-<yyyymmddhhmmss>-<commit_hash>. This is the reason, it is better to have tagged versions indicating stability and backward compatibility.
  • Once a dependency of a module on a specific version of another package is established, it would not be automatically changed.
There are several important commands related to go modules that we need to know, and use as needed.

To create a module in a directory, we must run the following at the module root:
go mod init <module_path>
This generates the go.mod file.

To update module dependencies, including cleaning up obsolete dependencies, we must run the following at the module root:
go mod tidy
This updates the dependencies in the go.mod file.

By default, just building code within a package also updates the dependencies inside go.mod. However it does not clean up obsolete dependencies. We can also add a new dependency explicitly, without running a build, using:
go get <package>

To update to latest the minor version or patch version of a dependency, we can run:
go get -u <package>
Or, to specifically upgrade only the patch version without upgrading the minor version:
go get -u=patch <package>
One can also upgrade to a specific minor / patch version:
go get -u=patch <package>@<semver>
You'd need this when you want to test your code against a pre-release version of a package (that has used some discipline to also define and tag stable versions).

Major version upgrades

A major version upgrade for a Go package should be rare, and it would typically be rarer after a couple of major version upgrades. Why? Because a major version upgrade typically represents a new API and a new way of doing things. It necessarily breaks the older API. It means that the older API must continue to be supported for a decent time for clients to move to the newer version (unless for some specific or perverse reason you can leave your clients in the lurch). In that way, Go tooling treats v2 and beyond differently from v0 and v1.

Essentially, your code could link against two different major version of a given package. This is not possible to do with two different minor or patch versions under the same major version of a given package. This allows parts of your code to start using a newer major version of a dependency without all the code moving at the same time. This may or may not always be technically feasible but when it is, this is convenient.

  • The package maintainer would typically create their new major version package in a separate subdirectory of the older package, and name it v2 or v3, or so on, as a convention.
    • The code for the new major version could be a copy of the old code that is then changed, or a complete rewrite, or some mix of the two. Internally, the code for the new major version may continue to call older code that is still useful. These details are hidden from the client.
  • The client would import the package with the new version by including the /v2 or /v3 etc. version directory suffix in the package path.
    • Usually v0 and v1 do not require a suffix. But v2 onwards, the suffix is recommended.
    • If two different major versions are imported, a different explicit alias is used for the higher version. For example, mypkg and mypkgV2.
    • Going ahead, at some point if all dependencies on v0/v1 are removed, the mypkgV2 alias can be removed as well and Go compiler would import the mypkg/v2 package with the alias mypkg automatically.

Private repos

<Stuff to cover about the GOPRIVATE environment variable, personal access tokens, etc.>

References

The sequence of articles starting here is actually all you need and more.

  1. Using Go Modules
  2. Go Workspace


Read more!

Wednesday, August 30, 2023

Timeouts In Envoy

Just a quick summary post. Envoy allows configuring various timeouts that allow tweaking the behavior of HTTP as well as general TCP traffic. Here is a summary of a few common ones. 

Why am I writing this? Like with a lot of Envoy documentation, all of this is documented, but not in a shape that is easy and quick to grok. Note that you need to understand Envoy and the structure of its configuration to fully understand some of the details referred to, but you can form an idea even without it if you understand TCP and HTTP in general.

There are chiefly three kinds of timeouts:

  1. Connection timeout: how long does it take to establish a connection?
  2. Idle timeout: how long does the connection exist without activity?
  3. Max duration: after how long is the connection broken irrespective of whether it is active or not? This is often disabled by default.

These often apply reasonably to both downstream and upstream connections, and configured appropriately either under a listener (in HTTP Connection Manager or TCP proxy) or in a cluster.

Connection timeouts

How long does it take to establish a connection?

This is a general scenario which can apply to either plain TCP or HTTP connection. There is also an HTTP analog in the form of stream timeout or the time it takes to establish an HTTP/2 or HTTP/3 stream.

A very HTTP-specific timeout is: How long would the proxy wait for an upstream to start responding after completely sending an HTTP request to it?

This is called a route timeout, that is set at the route level, and defaults to 15s. It can of course be overridden for individual routes.

Idle timeouts

How long can a connection stay idle without traffic in either direction?

Again, a general scenario that could apply to either plain TCP or HTTP connections. With HTTP/2 and above, idleness would require no active streams for a certain period. There is also an HTTP analog for streams in the form of an idle timeout for individual streams. These can also be overridden at the HTTP route level.

Here is another one. How long should a connection from the proxy to an upstream remain intact if there are no corresponding connections from a downstream to the proxy?

This is called TCP protocol idle timeout and is only available for plain TCP and is in fact a variation of the idle timeout.

Max duration

How long can a connection remain established at all, irrespective of whether there is traffic or not? This is normally disabled by default. It is not available for plain TCP, only for HTTP. Even when enabled, if there are active streams, those are drained before the connections is terminated. May be useful in certain situations when we want to avoid stickiness, or upstream addresses have changed and need reconnection without the older endpoints going away. There is an HTTP analog for maximum stream duration. These can also be overridden at the HTTP route level.

There are a few other timeouts with specific uses available, but the above is a good summary.



Read more!

Friday, August 25, 2023

All about JWKS or JSON Webb Key Sets

What are JSON Web Key Sets?

Refer to this too to understand how it looks: https://auth0.com/docs/secure/tokens/json-web-tokens/json-web-key-set-properties

In addition, refer to this: https://redthunder.blog/2017/06/08/jwts-jwks-kids-x5ts-oh-my/.

Besides, here are some handy commands.

First up, to get the public key from the cert, run:

openssl x509 -pubkey -noout -in <cert_file>

To generate the value of n, run:

openssl rsa -pubin -modulus -noout < public.key

Finallly to get the exponent (e), run:

openssl rsa -pubin -inform PEM -text -noout < public.key

The kid field needs to be some value that uniquely identifies which key was used for encryption. x5t is SHA-1 thumbprint of the leaf cert but is optional and can be skipped.

What good are they?

They are used to put together multiple cert bundles, which could be used to validate auth tokens such as JWS tokens. Many systems including Envoy takes the bundle in JWKS format, and this also works well with SPIFFE/SPIRE type systems.



Read more!

Sunday, June 11, 2023

Configuring Calico CNI with VPP Dataplane for Kubernetes

This is a quick run-down of how to configure Calico for pod networking in a Kubernetes cluster. Calico comes in several flavors, and we look at Calico with the VPP data plane, as opposed to classic Calico. One reason for looking at this option is to be able to use encryption at the L2 layer using IPsec, which is supported by VPP but not classic Calico.

Installing Kubernetes

This article doesn't cover how to install Kubernetes - there are several guides for doing that, including this one. Once you have installed Kubernetes on your cluster nodes, and the nodes have all joined the cluster, it is time to install the CNI plugin to allow pods to communicate across nodes. However, there are a few things that need to be ensured even while configuring Kubernetes, before we actually get to installing the CNI plugin.

Calico by default uses the subnet 192.168.0.0/16 for the pod network. It's good to use this default if you can, but if you cannot, choose the subnet you want to use. Then set Kubernetes up with the alternative CIDR you have in mind. If you use kubeadm to set up Kubernetes, then use the --pod-network-cidr command-line option to specify this CIDR. Here is an example command-line to do this on the (first) master node:

    kubeadm init --apiserver-advertise-address=10.20.0.7 --control-plane-endpoint=10.20.0.7:6443 --pod-network-cidr=172.31.0.0/16

The output of this command would contain the kubeadm command-line to run on the other nodes to register them with the master. At this point, running kubectl get nodes would list the cluster nodes but they would be shown in the NotReady state. To change that, we would need to install Calico.

Installing Calico

The page here already summarizes the process of installing Calico with VPP quite well, but there are a few things that need to be called out.

Hugepages and vfio-pci

This step is required only if you want to choose a specific VPP driver that would be used to drive the physical interface, namely virtio, dpdk, rdma, vmxnet3 (VMware), or avf (certain Intel drivers). If this is not explicitly chosen but left to the default, even then these settings can improve performance, but the memory requirements would typically be higher per node.

On each cluster node, create a file called /etc/sysctl.d/calico.conf and add the following content.

vm_nr.hugepages = 512

Then run:

    sudo sysctl -p

Similarly, on each cluster node create a file call /etc/modules-load.d/calico-vfio-pci.conf, and put the following content line it.

vfio-pci

On CentOS / RedHat, this should be vfio_pci instead. Then run:

    modprobe vfio-pci   # or vfio_pci on CentOS / RedHat

Finally, reboot the node.

Customizing the installation config

Create the Tigera operator:

    kubectl create -f https://raw.githubusercontent.com/projectcalico/calico/v3.24.6/manifests/tigera-operator.yaml

Download the installation-default.yaml file and modify it as suggested below:

    wget https://raw.githubusercontent.com/projectcalico/vpp-dataplane/v3.24.0/yaml/calico/installation-default.yaml

In this file, there would be two objects listed. An Installation object, and an APIServer object. We would only edit the manifest of the first. Under the spec.calicoNetwork sub-object of Installation, add the ipPools attribute as shown below:

spec:
  calicoNetwork:
    linuxDataplane: VPP
    ipPools:
    - cidr: 172.31.0.0/16    # or whatever you chose for your pod network CIDR
      encapsulation: VXLAN   # or IPIP
      natOutgoing: Enabled

While not commonly required, there is an option to override the image prefix used, in order to download images from non-default image registries / mirrors. This came in handy for me because I could use a corporate mirror instead of the default docker.io which had strict rate limits imposed. Use it thus:

spec:
  imagePrefix: some-prefix-ending-in-fwd-slash

Then apply this edited manifest:

    kubectl create -f installation-default-edited.yaml

This would create a number of pods, including the calico API controller, calico node daemonset pods, calico typha daemonset pods, etc. The calico node daemonset pods would not be up though till the VPP dataplane is installed.

Installing VPP

To install VPP, you got to use one of two manifests, depending on whether you configured hugepages *(use https://raw.githubusercontent.com/projectcalico/vpp-dataplane/v3.24.0/yaml/generated/calico-vpp.yaml) or not (use https://raw.githubusercontent.com/projectcalico/vpp-dataplane/v3.24.0/yaml/generated/calico-vpp-nohuge.yaml). Download the appropriate YAML, and then make the following edit if needed.

The vpp_dataplane_interface attribute should be set to the name of the NIC that would be used or the node-to-node communication. By default it is set to eth1, but if that's not the interface that would be used on your node (e.g. on my cluster, I am using eth0), then set this appropriately.

Then apply:

    kubectl apply -f calico-vpp.yaml   # or calico-vpp-nohuge.yaml

This would install the calico-vpp-dataplane daemonset on the cluster nodes. If all went well, then all the pods related to calico, and the core-dns pods should be up and running in a few minutes.

Enabling IPsec

For this, the instructions here are already adequate. You need to create a secret and put a pre-shared key in it:

    kubectl -n calico-vpp-dataplane create secret generic \
    calicovpp-ipsec-secret \
    --from-literal=psk="$(dd if=/dev/urandom bs=1 count=36 2>/dev/null | base64)"

Then patch the calico-vpp-node daemonset with the ipsec configuration:

    kubectl -n calico-vpp-dataplane patch daemonset calico-vpp-node \
    --patch "$(curl https://raw.githubusercontent.com/projectcalico/vpp-dataplane/v3.24.0/yaml/components/ipsec/ipsec.yaml)"


Packet Tracing and Troubleshooting

WIP, but this is where the docs are sketchy.


Read more!