Version: 2.22

Troubleshooting

Connecting to a cluster via VPN doesn't work.

There are a few different issues that could arise when working with a VPN. Please see the dedicated page on Telepresence and VPNs to learn more on how to fix these.

Connecting to a cluster hosted in a Docker Container or a VM on the workstation doesn't work

The cluster probably has access to the host's network and gets confused when it is mapped by Telepresence. Please check the cluster in hosted container or vm for more details.

Volume mounts are not working on macOS

It's necessary to have sshfs installed in order for volume mounts to work correctly during intercepts. Lately there's been some issues using brew install sshfs a macOS workstation because the required component osxfuse (now named macfuse) isn't open source and hence, no longer supported. As a workaround, you can now use gromgit/fuse/sshfs-mac instead. Follow these steps:

Remove old sshfs, macfuse, osxfuse using brew uninstall
brew install --cask macfuse
brew install gromgit/fuse/sshfs-mac
brew link --overwrite sshfs-mac

Now sshfs -V shows you the correct version, e.g.:

$ sshfs -V
SSHFS version 2.10
FUSE library version: 2.9.9
fuse: no mount point

Next, try a mount (or an replace/ingest/intercept that performs a mount). It will fail because you need to give permission to “Benjamin Fleischer” to execute a kernel extension (a pop-up appears that takes you to the system preferences).
Approve the needed permission
Reboot your computer.

Volume mounts are not working on Linux

It's necessary to have sshfs installed in order for volume mounts to work correctly when Telepresence engages with remote containers.

After you've installed sshfs, if mounts still aren't working:

Uncomment user_allow_other in /etc/fuse.conf
Add your user to the "fuse" group with: sudo usermod -a -G fuse <your username>
Restart your computer after uncommenting user_allow_other

DNS is broken on macOS

Commands like dig cannot find cluster resources even though Telepresence is connected to the cluster, but it works with curl.

This is because dig, and some other utilities on macOS have their own built-in DNS client which bypasses the macOS native DNS system and use the libc resolver directly. Here's an excerpt from the dig command's man-page:

Mac OS X NOTICE

The nslookup command does not use the host name and address resolution or the DNS query routing mechanisms used by other processes running on Mac OS X. The results of name or address queries printed by nslookup may differ from those found by other processes that use the Mac OS X native name and address resolution mechanisms. The results of DNS queries may also differ from queries that use the Mac OS X DNS routing library.

A command that should always work is:

Terminal

$ dscacheutil -q host -a name <name to resolve>

No Sidecar Injected in GKE private clusters

An attempt to telepresence intercept results in a timeout, and upon examination of the pods (kubectl get pods) it's discovered that the intercept command did not inject a sidecar into the workload's pods:

$ kubectl get pod
NAME                         READY   STATUS    RESTARTS   AGE
echo-easy-7f6d54cff8-rz44k   1/1     Running   0          5m5s

$ telepresence intercept echo-easy -p 8080
telepresence: error: connector.CreateIntercept: request timed out while waiting for agent echo-easy.default to arrive
$ kubectl get pod
NAME                        READY   STATUS    RESTARTS   AGE
echo-easy-d8dc4cc7c-27567   1/1     Running   0          2m9s

# Notice how 1/1 containers are ready.

If this is occurring in a GKE cluster with private networking enabled, it is likely due to firewall rules blocking the Traffic Manager's webhook injector from the API server. To fix this, add a firewall rule allowing your cluster's master nodes to access TCP port 443 in your cluster's pods, or change the port number that Telepresence is using for the agent injector by providing the number of an allowed port using the Helm chart value agentInjector.webhook.port. Please refer to the telepresence install instructions or the GCP docs for information to resolve this.

Injected init-container doesn't function properly

The init-container is injected to insert iptables rules that redirects port numbers from the app container to the traffic-agent sidecar. This is necessary when the service's targetPort is numeric. It requires elevated privileges (NET_ADMIN capabilities), and the inserted rules may get overridden by iptables rules inserted by other vendors, such as Istio or Linkerd.

Injection of the init-container can often be avoided by using a targetPort name instead of a number, and ensure that the corresponding container's containerPort is also named. This example uses the name "http", but any valid name will do:

apiVersion: v1
kind: Pod
metadata:
  ...
spec:
  containers:
    - ports:
      - name: http
        containerPort: 8080
---
apiVersion: v1
kind: Service
metadata:
  ...
spec:
  ports:
    - port: 80
      targetPort: http

Telepresence injects an init-container into the pods of a workload, only if at least one service specifies a numeric tagertPort that references a containerPort in the workload. When this isn't the case, it will instead do the following during the injection of the traffic-agent:

Rename the designated container's port by prefixing it (i.e., containerPort: http becomes containerPort: tm-http).
Let the container port of our injected traffic-agent use the original name (i.e., containerPort: http).

Kubernetes takes care of the rest and will now associate the service's targetPort with our traffic-agent's containerPort.

info

If the service is "headless" (using ClusterIP: None), then using named ports won't help because the targetPort will not get remapped. A headless service will always require the init-container.

EKS, Calico, and Traffic Agent injection timeouts

When using EKS with Calico CNI, the Kubernetes API server cannot reach the mutating webhook used for triggering the traffic agent injection. To solve this problem, try providing the Helm chart value "hostNetwork=true" when installing or upgrading the traffic-manager.

More information can be found in this blog post.

Error connecting to GKE or EKS cluster

GKE and EKS require a plugin that utilizes their resepective IAM providers. You will need to install the gke or eks plugins for Telepresence to connect to your cluster.

`too many files open` error when running `telepresence connect` on Linux

If telepresence connect on linux fails with a message in the logs too many files open, then check if fs.inotify.max_user_instances is set too low. Check the current settings with sysctl fs.inotify.max_user_instances and increase it temporarily with sudo sysctl -w fs.inotify.max_user_instances=512. For more information about permanently increasing it see Kernel inotify watch limit reached.

Connecting to a cluster via VPN doesn't work.​

Connecting to a cluster hosted in a Docker Container or a VM on the workstation doesn't work​

Volume mounts are not working on macOS​

Volume mounts are not working on Linux​

DNS is broken on macOS​

No Sidecar Injected in GKE private clusters​

Injected init-container doesn't function properly​

EKS, Calico, and Traffic Agent injection timeouts​

Error connecting to GKE or EKS cluster​

too many files open error when running telepresence connect on Linux​