Skip to content
This repository was archived by the owner on Jul 13, 2025. It is now read-only.

Conversation

@github-actions
Copy link

No description provided.

knyar and others added 30 commits November 18, 2025 18:04
Existing compaction logic seems to have had an assumption that
markActiveChain would cover a longer part of the chain than
markYoungAUMs. This prevented long, but fresh, chains, from being
compacted correctly.

Updates tailscale/corp#33537

Signed-off-by: Anton Tolchanov <anton@tailscale.com>
We use `tka.AUMHash` in `netmap.NetworkMap`, and we serialise it as JSON
in the `/debug/netmap` C2N endpoint. If the binary omits Tailnet Lock support,
the debug endpoint returns an error because it's unable to marshal the
AUMHash.

This patch adds a sentinel value so this marshalling works, and we can
use the debug endpoint.

Updates #17115

Signed-off-by: Alex Chan <alexc@tailscale.com>

Change-Id: I51ec1491a74e9b9f49d1766abd89681049e09ce4
As part of the conn25 work we will want to be able to keep track of a
pool of IP Addresses and know which have been used and which have not.

Fixes tailscale/corp#34247

Signed-off-by: Fran Bull <fran@tailscale.com>
…co key rotation

Adds the ability to rotate discovery keys on running clients, needed for
testing upcoming disco key distribution changes.

Introduces key.DiscoKey, an atomic container for a disco private key,
public key, and the public key's ShortString, replacing the prior
separate atomic fields.

magicsock.Conn has a new RotateDiscoKey method, and access to this is
provided via localapi and a CLI debug command.

Note that this implementation is primarily for testing as it stands, and
regular use should likely introduce an additional mechanism that allows
the old key to be used for some time, to provide a seamless key rotation
rather than one that invalidates all sessions.

Updates tailscale/corp#34037

Signed-off-by: James Tucker <james@tailscale.com>
…17955)

We now embed node information into network flow logs.
By default, netlogfmt still prints out using Tailscale IP addresses.
Support a "--resolve-addrs=TYPE" flag that can be used to specify
resolving IP addresses as node IDs, hostnames, users, or tags.

Updates tailscale/corp#33352

Signed-off-by: Joe Tsai <joetsai@digital-static.net>
Updates tailscale/corp#25406

Change-Id: I7832dbe3dce3774bcc831e3111feb75bcc9e021d
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
(trying to get in smaller obvious chunks ahead of later PRs to make
them smaller)

Updates #17925

Change-Id: I184002001055790484e4792af8ffe2a9a2465b2e
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This commit modifies the helm/static manifest configuration for the
k8s-operator to prefer the stable image tag. This avoids making those
using static manifests seeing unstable behaviour by default if they
do not manually make the change.

This is managed for us when using helm but not when generating the
static manifests.

Updates #10655

Signed-off-by: David Bond <davidsbond93@gmail.com>
Previously a TKA compaction would only run when a node starts, which means a long-running node could use unbounded storage as it accumulates ever-increasing amounts of TKA state. This patch changes TKA so it runs a compaction after every sync.

Updates tailscale/corp#33537

Change-Id: I91df887ea0c5a5b00cb6caced85aeffa2a4b24ee
Signed-off-by: Alex Chan <alexc@tailscale.com>
…7981)

ArgoCD sends boolean values but the template expects strings, causing
"incompatible types for comparison" errors. Wrap values with toString
so both work.

Fixes #17158

Signed-off-by: Raj Singh <raj@tailscale.com>
Our style guide recommends avoiding Latin abbreviations in technical
documentation, which includes the CLI help text. This is causing linter
issues for the docs site, because this help text is copied into the docs.
See http://go/style-guide/kb/language-and-grammar/abbreviations#latin-abbreviations

Updates #cleanup

Change-Id: I980c28d996466f0503aaaa65127685f4af608039
Signed-off-by: Alex Chan <alexc@tailscale.com>
Updates #cleanup

Change-Id: Ib7b497e22c6cdd80578c69cf728d45754e6f909e
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Now that we support using an in-memory backend for TKA state (#17946),
this function always returns `nil` – we can always support Network Lock.
We don't need it any more.

Plus, clean up a couple of errant TODOs from that PR.

Updates tailscale/corp#33599

Change-Id: Ief93bb9adebb82b9ad1b3e406d1ae9d2fa234877
Signed-off-by: Alex Chan <alexc@tailscale.com>
Signed-off-by: License Updater <noreply+license-updater@tailscale.com>
This commit enables user to set service backend to remote destinations, that can be a partial
URL or a full URL. The commit also prevents user to set remote destinations on linux system
when socket mark is not working. For user on any version of mac extension they can't serve a
service either. The socket mark usability is determined by a new local api.

Fixes tailscale/corp#24783

Signed-off-by: KevinLiang10 <37811973+KevinLiang10@users.noreply.github.com>
This commit modifies the kubernetes operator to use the "stable" version
of `k8s-nameserver` by default.

Updates: tailscale/corp#19028

Signed-off-by: David Bond <davidsbond93@gmail.com>
From #17842

Updates #cleanup

Change-Id: Ie041b50659361b50558d5ec1f557688d09935f7c
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
fixes #17990

The logging for the netns caps is spammy.  Log only on changes
to the values and don't log Darwin specific stuff on non Darwin
clients.

Signed-off-by: Jonathan Nobels <jonathan@tailscale.com>
This commit adds the `spec.replicas` field to the `Recorder` custom
resource that allows for a highly available deployment of `tsrecorder`
within a kubernetes cluster.

Many changes were required here as the code hard-coded the assumption
of a single replica. This has required a few loops, similar to what we
do for the `Connector` resource to create auth and state secrets. It
was also required to add a check to remove dangling state and auth
secrets should the recorder be scaled down.

Updates: #17965

Signed-off-by: David Bond <davidsbond93@gmail.com>
These validations were previously performed in the CLI frontend. There
are two motivations for moving these to the local backend:
1. The backend controls synchronization around the relevant state, so
   only the backend can guarantee many of these validations.
2. Doing these validations in the back-end avoids the need to repeat
   them across every frontend (e.g. the CLI and tsnet).

Updates tailscale/corp#27200

Signed-off-by: Harry Harpham <harry@tailscale.com>
Pick up fixes for https://pkg.go.dev/vuln/GO-2025-4134

Updates #cleanup

Signed-off-by: Andrew Lytvynov <awly@tailscale.com>
…pen (#17883)

With the introduction of node sealing, store.New fails in some cases due
to the TPM device being reset or unavailable. Currently it results in
tailscaled crashing at startup, which is not obvious to the user until
they check the logs.

Instead of crashing tailscaled at startup, start with an in-memory store
with a health warning about state initialization and a link to (future)
docs on what to do. When this health message is set, also block any
login attempts to avoid masking the problem with an ephemeral node
registration.

Updates #15830
Updates #17654

Signed-off-by: Andrew Lytvynov <awly@tailscale.com>
This is causing confusing panics in tailscale/corp#34485. We'll keep
using the tka.ChonkMem constructor as much as we can, but don't panic
if you create a tka.Mem directly -- we know what the sensible thing is.

Updates #cleanup

Signed-off-by: Alex Chan <alexc@tailscale.com>

Change-Id: I49309f5f403fc26ce4f9a6cf0edc8eddf6a6f3a4
…mutex as a publisher

As of 2025-11-20, publishing more events than the eventbus's
internal queues can hold may deadlock if a subscriber tries
to acquire a mutex that can also be held by a publisher.

This commit adds a test that demonstrates this deadlock,
and skips it until the bug is fixed.

Updates #17973

Signed-off-by: Nick Khyl <nickk@tailscale.com>
As of 2025-11-20, publishing more events than the eventbus's
internal queues can hold may deadlock if a subscriber tries
to publish events itself.

This commit adds a test that demonstrates this deadlock,
and skips it until the bug is fixed.

Updates #18012

Signed-off-by: Nick Khyl <nickk@tailscale.com>
Updates #17996

Signed-off-by: Claus Lensbøl <claus@tailscale.com>
Updates #17830

Signed-off-by: Jordan Whited <jordan@tailscale.com>
…cribers

Bounded DeliveredEvent queues reduce memory usage, but they can deadlock under load.
Two common scenarios trigger deadlocks when the number of events published in a short
period exceeds twice the queue capacity (there's a PublishedEvent queue of the same size):
 - a subscriber tries to acquire the same mutex as held by a publisher, or
 - a subscriber for A events publishes B events

Avoiding these scenarios is not practical and would limit eventbus usefulness and reduce its adoption,
pushing us back to callbacks and other legacy mechanisms. These deadlocks already occurred in customer
devices, dev machines, and tests. They also make it harder to identify and fix slow subscribers and similar
issues we have been seeing recently.

Choosing an arbitrary large fixed queue capacity would only mask the problem. A client running
on a sufficiently large and complex customer environment can exceed any meaningful constant limit,
since event volume depends on the number of peers and other factors. Behavior also changes
based on scheduling of publishers and subscribers by the Go runtime, OS, and hardware, as the issue
is essentially a race between publishers and subscribers. Additionally, on lower-end devices,
an unreasonably high constant capacity is practically the same as using unbounded queues.

Therefore, this PR changes the event queue implementation to be unbounded by default.
The PublishedEvent queue keeps its existing capacity of 16 items, while subscribers'
DeliveredEvent queues become unbounded.

This change fixes known deadlocks and makes the system stable under load,
at the cost of higher potential memory usage, including cases where a queue grows
during an event burst and does not shrink when load decreases.

Further improvements can be implemented in the future as needed.

Fixes #17973
Fixes #18012

Signed-off-by: Nick Khyl <nickk@tailscale.com>
Linux kernel versions 6.6.102-104 and 6.12.42-45 have a regression
in /proc/net/tcp that causes seek operations to fail with "illegal seek".
This breaks portlist tests on these kernels.

Add kernel version detection for Linux systems and a SkipOnKernelVersions
helper to tstest. Use it to skip affected portlist tests on the broken
kernel versions.

Thanks to philiptaron for the list of kernels with the issue and fix.

Updates #16966

Signed-off-by: Andrew Dunham <andrew@tailscale.com>
bradfitz and others added 30 commits February 3, 2026 09:10
Updates tailscale/go#149

Change-Id: If0483466eb1fc2196838c75f6d53925b1809abff
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
…8568)

Not all Linux distros use systemd yet, for example GL.iNet KVM devices
use busybox's init, which is similar to SysV init.
This is a best-effort restart attempt after the update, it probably
won't cover 100% of init.d setups out there.

Fixes #18567

Signed-off-by: Andrew Lytvynov <awly@tailscale.com>
…8356)

When the NodeAttrDNSSubdomainResolve capability is present, enable
wildcard certificate issuance to cover all single-level subdomains
of a node's CertDomain.

Without the capability, only exact CertDomain matches are allowed,
so node.ts.net yields a cert for node.ts.net. With the capability,
we now generate wildcard certificates. Wildcard certs include both
the wildcard and base domain in their SANs, and ACME authorization
requests both identifiers. The cert filenames are kept still based
on the base domain with the wildcard prefix stripped, so we aren't
creating separate files. DNS challenges still used the base domain

The checkCertDomain function is replaced by resolveCertDomain that
both validates and returns the appropriate cert domain to request.
Name validation is now moved earlier into GetCertPEMWithValidity()

Fixes #1196

Signed-off-by: Fernando Serboncini <fserb@tailscale.com>
This resolves a gap in test coverage, ensuring Server.ListenService
functions as expected in combination with user-supplied TUN devices

Fixes tailscale/corp#36603

Co-authored-by: Harry Harpham <harry@tailscale.com>
Signed-off-by: Harry Harpham <harry@tailscale.com>
We already had a featuretag for clientupdate, but the CLI wasn't using
it, making the "minbox" build (minimal combined tailscaled + CLI
build) larger than necessary.

Updates #12614

Change-Id: Idd7546c67dece7078f25b8f2ae9886f58d599002
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
…uretags

Package feature/conn25 is excludeable from a build via the featuretag.
Test it is excluded for minimal builds.

Updates #12614

Signed-off-by: Fran Bull <fran@tailscale.com>
Updates #12614

Change-Id: I49351fe0c463af0b8d940e8088d4748906a8aec3
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Use the parsed and validated advertise tags value from prefs instead of
doing a strings.Split on the raw tags value as an input to the OAuth and
identity federation auth key generation methods.

The previous strings.Split method would return an array with a single
empty string element which would pass downstream length checks on the
tags argument before eventually failing with a confusing message when
hitting the API.

Fixes #18617

Signed-off-by: Mario Minardi <mario@tailscale.com>
If any profiles exist and an Authkey is provided via syspolicy, the
AuthKey is ignored on backend start, preventing re-auth attempts. This
is useful for one-time device provisioning scenarios, skipping authKey
use after initial setup when the authKey may no longer be valid.

updates #18618

Signed-off-by: Will Hannah <willh@tailscale.com>
Updates #12614
Updates #18562

Change-Id: Ife4f10c55d1d68569938ffd68ffe72eef889e200
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
concurrent netmaps that if the first is logged in, it is never skipped.
This should have been covered be the skip test case, but that case
wasn't updated to include level set state.

Updates #12639
Updates #17869

Signed-off-by: James Tucker <james@tailscale.com>
Currently the expvar exporter attempts to write expvar.String, which
breaks the Prometheus metric page.

Updates tailscale/corp#36552

Signed-off-by: Anton Tolchanov <anton@tailscale.com>
Updates #cleanup

Signed-off-by: Anton Tolchanov <anton@tailscale.com>
Under extremely high load it appears we may have some retention issues
as a result of queue depth build up, but there is currently no direct
way to observe this. The scenario does not trigger the slow subscriber
log message, and the event stream debugging endpoint produces a
saturating volume of information.

Updates tailscale/corp#36904

Signed-off-by: James Tucker <james@tailscale.com>
Updates #18629

Signed-off-by: Andrew Lytvynov <awly@tailscale.com>
This commit adds a bool named PeerRelay to Hostinfo, to identify the host's status of acting as a peer relay.
Considering the RelayServerPort number can be 0, I just made this a bool in stead of a port number. If the port
info is needed in future this would also help indicating if the port was set to 0 (meaning any port in peer relay
context).

Updates tailscale/corp#35862

Signed-off-by: KevinLiang10 <37811973+KevinLiang10@users.noreply.github.com>
… omittable

Add new "webbrowser" and "colorable" feature tags so that the
github.com/toqueteos/webbrowser and mattn/go-colorable packages
can be excluded from minbox builds.

Updates #12614

Change-Id: Iabd38b242f5a56aa10ef2050113785283f4e1fe8
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This updates the URL shown by systemd to the new URL used by the docs
after the recent migration.

Fixes #18646

Signed-off-by: Tim Walters <tim@tailscale.com>
wiki.nixos.org is and has been the official wiki for quite some time now.

Signed-off-by: faukah <fau@faukah.com>
bart has gained a bunch of purported performance and usability
improvements since the current version we are using (0.18.0,
from 1y ago)

Updates tailscale/corp#36982

Signed-off-by: Amal Bansode <amal@tailscale.com>
app connector packets

We introduce the Conn25PacketHooks interface to be used as a nil-able
field in userspaceEngine. The engine then plumbs through the functions
to the corresponding tstun.Wrapper intercepts.

The new intercepts run pre-filter when egressing toward WireGuard,
and post-filter when ingressing from WireGuard. This is preserve the
design invariant that the filter recognizes the traffic as interesting
app connector traffic.

This commit does not plumb through implementation of the interface, so
should be a functional no-op.

Fixes tailscale/corp#35985

Signed-off-by: Michael Ben-Ami <mzb@tailscale.com>
Fixes #18118

Change-Id: I118fcc6537af9ccbdc7ce6b78134e8059b0b5ccf
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
-Wait does not just wait for the created process; it waits for the
entire process tree rooted at that process! This can cause the shell
to wait indefinitely if something in that tree fired up any background
processes.

Instead we call WaitForExit on the returned process.

Updates tailscale/corp#29940

Signed-off-by: Aaron Klotz <aaron@tailscale.com>
Fixes #18631

Signed-off-by: Becky Pauley <becky@tailscale.com>
…-dns is false (#18572)

fixes #18436

Queries can still make their way to the forwarder when accept-dns is disabled.
Since we have not configured the forwarder if --accept-dns is false, this errors out
(correctly) but it also generates a persistent health warning.   This forwards the
Pref setting all the way through the stack to the forwarder so that we can be more
judicious about when we decide that the forward path is unintentionally missing, vs
simply not configured.

Testing:
tailscale set --accept-dns=false. (or from the GUI)
dig @100.100.100.100 example.com
tailscale status

No dns related health warnings should be surfaced.

Signed-off-by: Jonathan Nobels <jonathan@tailscale.com>
…e Synchronize hack

Restore synchronous method calls from LocalBackend to magicsock.Conn
for node views, filter, and delta mutations. The eventbus delivery
introduced in 8e6f63c was invalid for these updates because
subsequent operations in the same call chain depend on magicsock
already having the current state. The Synchronize/settleEventBus
workaround was fragile and kept requiring more workarounds and
introducing new mystery bugs.

Since eventbus was added, we've since learned more about when to use
eventbus, and this wasn't one of the cases.

We can take another swing at using eventbus for netmap changes in a
future change.

Fixes #16369
Updates #18575 (likely fixes)

Change-Id: I79057cc9259993368bb1e350ff0e073adf6b9a8f
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Updates rotateLocked so that we hold the activeStderrWriteForTest write
lock around the dup2Stderr call, rather than acquiring it only after
dup2 was already compelete.  This ensures no stderrWriteForTest calls
can race with the dup2 syscall. The now unused waitIdleStderrForTest has
been removed.

On macOS, dup2 and write on the same file descriptor are not atomic with
respect to each other, when rotateLocked called dup2Stderr to redirect
the stderr fd to a new file, concurrent goroutines calling
stderrWriteForTest could observe the fd in a transiently invalid state,
resulting in the bad file descripter.

Fixes tailscale/corp#36953

Signed-off-by: James Scott <jim@tailscale.com>
Signed-off-by: License Updater <noreply+license-updater@tailscale.com>
…18681)

When traffic steering is enabled, some users are suggested an exit
node that is inappropriately far from their location. This seems to
happen right when the client connects to the control plane and the
client eventually fixes itself. But whenever an affected client
reconnects, its suggested exit node flaps, and this happens often
enough to be noticeable because connections drop whenever the exit
node is switched. This should not happen, since the map response that
contains the list of suggested exit nodes that the client picks from,
also contains the scores for those nodes.

Since our current logging and diagnostic tools don’t give us enough
insight into what is happening, this PR adds additional logging when:
- traffic steering scores are used to suggest an exit node
- an exit node is suggested, no matter how it was determined

Updates: tailscale/corp#29964
Updates: tailscale/corp#36446

Signed-off-by: Simon Law <sfllaw@tailscale.com>
This updates the TS_GO_NEXT=1 (testing) toolchain to Go 1.26.0

The default one is still Go 1.25.x.

Updates #18682

Change-Id: I99747798c166ce162ee9eee74baa9ff6744a62f6
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.