Andrew Elwell
11/07/2025, 5:10 AM- name: content_modifier
action: extract
key: 'message'
# print one line per reply, with time, IP, name, type, class, rcode, timetoresolve, fromcache and responsesize.
pattern: '^(?<source_ip>[^ ]+) (?<query_request>[^ ]+) (?<query_record_type>[^ ]+) (?<query_class>[^ ]+) (?<query_result>[^ ]+) (?<unbound_time_to_resolve>[^ ]+) (?<unbound_from_cache>[^ ]+) (?<query_response_length>.+)$'
condition:
op: and
rules:
- field: '$message_type'
op: eq
value: 'reply'
- name: content_modifier
action: convert
key: unbound_time_to_resolve
converted_type: int
- name: content_modifier
action: convert
key: query_response_length
converted_type: int
- name: content_modifier
action: convert
key: unbound_from_cache
converted_type: boolean
but I'm still getting
[3] log_unbound: [[1762491366.000000000, {}], {"process"=>"unbound", "pid"=>934, "tid"=>"1", "message_type"=>"reply", "message"=>"127.0.0.1 <http://vmetrics1.pawsey.org.au|vmetrics1.pawsey.org.au>. AAAA IN NOERROR 0.000000 1 109", "gim_event_type_code"=>"140200", "source_ip"=>"127.0.0.1", "query_request"=>"vmetrics1.pawsey.org.au.", "query_record_type"=>"AAAA", "query_class"=>"IN", "query_result"=>"NOERROR", "unbound_time_to_resolve"=>"0.000000", "unbound_from_cache"=>"1", "query_response_length"=>"109", "cluster"=>"DNS", "event_reporter"=>"ns0"}]gcol
11/07/2025, 7:49 AMAlex
11/07/2025, 8:52 AM[error] [output:opentelemetry:opentelemetry.3] could not flush records (http_do=-1)
Log_Level debug: (pos=5 or pos=0)
[debug] [yyjson->msgpack] read error code=6 msg=unexpected character, expected a JSON value pos=0
I don't see the error from info log_level in the debug for some reason. (JFYI)
Questions:
1. Is forwarding OTLP traces from OpenTelemetry input to OpenTelemetry output supposed to work in Fluent Bit?
2. Should I use Raw_Traces On parameter? (tried both with/without - same errors)
[INPUT]
Name opentelemetry
Listen 0.0.0.0
Port 4318
Tag otel
Tag_From_Uri Off
[OUTPUT]
Name opentelemetry
Match otel
Host ${OSIS_PIPELINE_HOST}
Port 443
Traces_uri /v1/traces
Tls On
Tls.verify On
aws_auth On
aws_service osis
aws_region ${AWS_REGION}
Thanks in advance! 🙏Dean Meehan
11/07/2025, 2:14 PMfields.service_name to OTEL Resource: service.name
Fluentbit Tail: {"event": "my log message", "fields": {"service_name": "my_service", "datacenter": "eu-west"}}Gmo1492
11/07/2025, 4:38 PMGmo1492
11/07/2025, 4:38 PMGmo1492
11/07/2025, 4:39 PMGmo1492
11/07/2025, 4:39 PMReading state information... Done
E: Unable to locate package fluent-bit
[fluent-bit][error] APT install failed (vendor repo unreachable and Ubuntu archive install failed).Gmo1492
11/07/2025, 4:39 PMGmo1492
11/07/2025, 4:42 PMapt-key is deprecated. Manage keyring files in trusted.gpg.d instead (see apt-key(8)). (my stuff was outdated) - once I changed it to follow the latest install docs it failsScott Bisker
11/07/2025, 4:46 PMGmo1492
11/07/2025, 4:48 PMJason A
11/07/2025, 5:38 PMamazon-ebs: E: Failed to fetch <https://packages.fluentbit.io/ubuntu/noble/dists/noble/InRelease> 522
amazon-ebs: E: The repository '<https://packages.fluentbit.io/ubuntu/noble> noble InRelease' is no longer signed.
amazon-ebs: N: Updating from such a repository can't be done securely, and is therefore disabled by default.
amazon-ebs: N: See apt-secure(8) manpage for repository creation and user configuration details.
==> amazon-ebs: Provisioning step had errors: Running the cleanup provisScott Bisker
11/07/2025, 6:19 PMJason A
11/07/2025, 6:21 PMJosh
11/07/2025, 6:52 PMGmo1492
11/07/2025, 7:06 PMCelalettin
11/07/2025, 7:23 PMSaksham
11/10/2025, 8:23 AMBryson Edwards
11/10/2025, 10:49 PM{
"namespace": "test"
}
i would want:
{
"some_new_field": "some_new_value",
"spec": {
"namespace": "test"
}
}Michael Marshall
11/11/2025, 9:51 PMPost "<http://192.168.141.95:9880/services/collector>": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
I have tried lots of options and configurations, but my current one is cli based:
root@ip-192-168-141-95:~# /opt/fluent-bit/bin/fluent-bit -i splunk -p port=9880 -p buffer_chunk_size=1024 -p buffer_max_size=32M -p tag=splunk.logs -p net.io_timeout=300s -o stdout -p match=splunk.logs -vv
which is producing:
[2025/11/11 21:44:09.381347930] [trace] [io] connection OK
[2025/11/11 21:44:09.381397730] [trace] [sched] 0 timer coroutines destroyed
[2025/11/11 21:44:09.381863699] [trace] [io coro=(nil)] [net_read] try up to 1024 bytes
[2025/11/11 21:44:09.381894442] [trace] [io coro=(nil)] [net_read] ret=1024
[2025/11/11 21:44:09.382594157] [trace] [io coro=(nil)] [net_read] try up to 1024 bytes
[2025/11/11 21:44:09.382625300] [trace] [io coro=(nil)] [net_read] ret=1024
[2025/11/11 21:44:09.382642132] [trace] [io coro=(nil)] [net_read] try up to 1024 bytes
[2025/11/11 21:44:09.382657844] [trace] [io coro=(nil)] [net_read] ret=1024
[2025/11/11 21:44:09.382674014] [trace] [io coro=(nil)] [net_read] try up to 1024 bytes
[2025/11/11 21:44:09.382684183] [trace] [io coro=(nil)] [net_read] ret=1024
[2025/11/11 21:44:09.382700140] [trace] [io coro=(nil)] [net_read] try up to 1024 bytes
[2025/11/11 21:44:09.382710296] [trace] [io coro=(nil)] [net_read] ret=1024
[2025/11/11 21:44:09.382724162] [trace] [io coro=(nil)] [net_read] try up to 1024 bytes
[2025/11/11 21:44:09.382734559] [trace] [io coro=(nil)] [net_read] ret=1024
[2025/11/11 21:44:09.382748716] [trace] [io coro=(nil)] [net_read] try up to 1024 bytes
[2025/11/11 21:44:09.382759216] [trace] [io coro=(nil)] [net_read] ret=1024
[2025/11/11 21:44:09.382772032] [trace] [io coro=(nil)] [net_read] try up to 1024 bytes
[2025/11/11 21:44:09.382782780] [trace] [io coro=(nil)] [net_read] ret=1024
[2025/11/11 21:44:09.382796156] [trace] [io coro=(nil)] [net_read] try up to 1024 bytes
[2025/11/11 21:44:09.382805907] [trace] [io coro=(nil)] [net_read] ret=1024
[2025/11/11 21:44:09.382818906] [trace] [io coro=(nil)] [net_read] try up to 1024 bytes
[2025/11/11 21:44:09.382828814] [trace] [io coro=(nil)] [net_read] ret=1024
[2025/11/11 21:44:09.382843934] [trace] [sched] 0 timer coroutines destroyed
[2025/11/11 21:44:09.382853034] [trace] [io coro=(nil)] [net_read] try up to 1024 bytes
[2025/11/11 21:44:09.382863254] [trace] [io coro=(nil)] [net_read] ret=1024
[2025/11/11 21:44:09.382878776] [trace] [io coro=(nil)] [net_read] try up to 1024 bytes
[2025/11/11 21:44:09.382888383] [trace] [io coro=(nil)] [net_read] ret=1024
[2025/11/11 21:44:09.382908014] [trace] [io coro=(nil)] [net_read] try up to 1024 bytes
[2025/11/11 21:44:09.382918664] [trace] [io coro=(nil)] [net_read] ret=1024
[2025/11/11 21:44:09.382933485] [trace] [io coro=(nil)] [net_read] try up to 1024 bytes
[2025/11/11 21:44:09.382943435] [trace] [io coro=(nil)] [net_read] ret=1024
[2025/11/11 21:44:09.382961527] [trace] [io coro=(nil)] [net_read] try up to 1024 bytes
[2025/11/11 21:44:09.382972431] [trace] [io coro=(nil)] [net_read] ret=1024
[2025/11/11 21:44:09.382990641] [trace] [io coro=(nil)] [net_read] try up to 1024 bytes
[2025/11/11 21:44:09.383000808] [trace] [io coro=(nil)] [net_read] ret=1024
[2025/11/11 21:44:09.383026942] [trace] [io coro=(nil)] [net_read] try up to 1024 bytes
[2025/11/11 21:44:09.383042965] [trace] [io coro=(nil)] [net_read] ret=1024
[2025/11/11 21:44:09.383060552] [trace] [io coro=(nil)] [net_read] try up to 1024 bytes
[2025/11/11 21:44:09.383070761] [trace] [io coro=(nil)] [net_read] ret=1024
[2025/11/11 21:44:09.383085467] [trace] [io coro=(nil)] [net_read] try up to 1024 bytes
[2025/11/11 21:44:09.383097179] [trace] [io coro=(nil)] [net_read] ret=1024
[2025/11/11 21:44:09.383111593] [trace] [io coro=(nil)] [net_read] try up to 1024 bytes
[2025/11/11 21:44:09.383120958] [trace] [io coro=(nil)] [net_read] ret=1024
[2025/11/11 21:44:09.383137668] [trace] [sched] 0 timer coroutines destroyed
[2025/11/11 21:44:09.383146008] [trace] [io coro=(nil)] [net_read] try up to 1024 bytes
[2025/11/11 21:44:09.383157180] [trace] [io coro=(nil)] [net_read] ret=1024
[2025/11/11 21:44:09.383170359] [trace] [io coro=(nil)] [net_read] try up to 1024 bytes
[2025/11/11 21:44:09.383179843] [trace] [io coro=(nil)] [net_read] ret=1024
[2025/11/11 21:44:09.383193275] [trace] [io coro=(nil)] [net_read] try up to 1024 bytes
[2025/11/11 21:44:09.383203629] [trace] [io coro=(nil)] [net_read] ret=706
[2025/11/11 21:44:09.383216611] [trace] [sched] 0 timer coroutines destroyed
[2025/11/11 21:44:09.431509514] [trace] [sched] 0 timer coroutines destroyed
[2025/11/11 21:44:09.681537238] [trace] [sched] 0 timer coroutines destroyed
[2025/11/11 21:44:09.681554644] [trace] [sched] 0 timer coroutines destroyed
[2025/11/11 21:44:09.879452869] [trace] [io coro=(nil)] [net_read] try up to 1024 bytes
[2025/11/11 21:44:09.879531281] [trace] [io coro=(nil)] [net_read] ret=0
[2025/11/11 21:44:09.879549725] [trace] [downstream] destroy connection #48 to <tcp://192.168.141.95:46304>
[2025/11/11 21:44:09.879621135] [trace] [sched] 0 timer coroutines destroyed
[2025/11/11 21:44:09.931509675] [trace] [sched] 0 timer coroutines destroyed
[2025/11/11 21:44:10.95119333] [trace] [io coro=(nil)] [net_read] try up to 1024 bytes
[2025/11/11 21:44:10.95162342] [trace] [io coro=(nil)] [net_read] ret=0
[2025/11/11 21:44:10.95179536] [trace] [downstream] destroy connection #49 to <tcp://192.168.141.95:46314>
[2025/11/11 21:44:10.95247475] [trace] [sched] 0 timer coroutines destroyed
[2025/11/11 21:44:10.181511800] [trace] [sched] 0 timer coroutines destroyed
[2025/11/11 21:44:10.431508128] [trace] [sched] 0 timer coroutines destroyed
[2025/11/11 21:44:10.681546565] [trace] [sched] 0 timer coroutines destroyed
[2025/11/11 21:44:10.681585263] [trace] [sched] 0 timer coroutines destroyed
[2025/11/11 21:44:10.931508179] [trace] [sched] 0 timer coroutines destroyed
[2025/11/11 21:44:11.181514100] [trace] [sched] 0 timer coroutines destroyed
[2025/11/11 21:44:11.431510732] [trace] [sched] 0 timer coroutines destroyed
[2025/11/11 21:44:11.681539544] [trace] [sched] 0 timer coroutines destroyed
[2025/11/11 21:44:11.931508704] [trace] [sched] 0 timer coroutines destroyed
[2025/11/11 21:44:12.173087049] [trace] [io] connection OK
[2025/11/11 21:44:12.173199150] [trace] [sched] 0 timer coroutines destroyed
[2025/11/11 21:44:12.173810559] [trace] [io coro=(nil)] [net_read] try up to 1024 bytes
[2025/11/11 21:44:12.173841862] [trace] [io coro=(nil)] [net_read] ret=1024
[2025/11/11 21:44:12.173872772] [trace] [io coro=(nil)] [net_read] try up to 1024 bytes
[2025/11/11 21:44:12.173883888] [trace] [io coro=(nil)] [net_read] ret=1024
[2025/11/11 21:44:12.173898853] [trace] [io coro=(nil)] [net_read] try up to 1024 bytes
[2025/11/11 21:44:12.173909280] [trace] [io coro=(nil)] [net_read] ret=1024
[2025/11/11 21:44:12.173923156] [trace] [io coro=(nil)] [net_read] try up to 1024 bytes
[2025/11/11 21:44:12.173933024] [trace] [io coro=(nil)] [net_read] ret=1024
[2025/11/11 21:44:12.173946124] [trace] [io coro=(nil)] [net_read] try up to 1024 bytes
[2025/11/11 21:44:12.173955163] [trace] [io coro=(nil)] [net_read] ret=1024
[2025/11/11 21:44:12.173967800] [trace] [io coro=(nil)] [net_read] try up to 1024 bytes
[2025/11/11 21:44:12.173977479] [trace] [io coro=(nil)] [net_read] ret=1024
[2025/11/11 21:44:12.173989628] [trace] [io coro=(nil)] [net_read] try up to 1024 bytes
[2025/11/11 21:44:12.173999144] [trace] [io coro=(nil)] [net_read] ret=1024
[2025/11/11 21:44:12.174096070] [trace] [io coro=(nil)] [net_read] try up to 1024 bytes
[2025/11/11 21:44:12.174110901] [trace] [io coro=(nil)] [net_read] ret=1024
[2025/11/11 21:44:12.174203854] [trace] [io coro=(nil)] [net_read] try up to 1024 bytes
[2025/11/11 21:44:12.174395146] [trace] [io coro=(nil)] [net_read] ret=1024
[2025/11/11 21:44:12.174415522] [trace] [io coro=(nil)] [net_read] try up to 1024 bytes
[2025/11/11 21:44:12.174426114] [trace] [io coro=(nil)] [net_read] ret=1024
[2025/11/11 21:44:12.174435379] [trace] [sched] 0 timer coroutines destroyed
[2025/11/11 21:44:12.174441221] [trace] [io coro=(nil)] [net_read] try up to 1024 bytes
[2025/11/11 21:44:12.174447781] [trace] [io coro=(nil)] [net_read] ret=314
[2025/11/11 21:44:12.174457878] [trace] [sched] 0 timer coroutines destroyed
[2025/11/11 21:44:12.181508649] [trace] [sched] 0 timer coroutines destroyed
[2025/11/11 21:44:12.181507560] [trace] [sched] 0 timer coroutines destroyed
[2025/11/11 21:44:12.430735078] [trace] [io coro=(nil)] [net_read] try up to 1024 bytes
[2025/11/11 21:44:12.430779926] [trace] [io coro=(nil)] [net_read] ret=0
[2025/11/11 21:44:12.430796710] [trace] [downstream] destroy connection #52 to <tcp://192.168.141.95:46322>
[2025/11/11 21:44:12.430866695] [trace] [sched] 0 timer coroutines destroyed
[2025/11/11 21:44:12.431506047] [trace] [sched] 0 timer coroutines destroyed
[2025/11/11 21:44:12.681535932] [trace] [sched] 0 timer coroutines destroyed
Any ideas?
When i switched it to tcp, i get:
______ _ _ ______ _ _ ___ __
| ___| | | | | ___ (_) | / | / |
| |_ | |_ _ ___ _ __ | |_ | |_/ /_| |_ __ __/ /| | `| |
| _| | | | | |/ _ \ '_ \| __| | ___ \ | __| \ \ / / /_| | | |
| | | | |_| | __/ | | | |_ | |_/ / | |_ \ V /\___ |__| |_
\_| |_|\__,_|\___|_| |_|\__| \____/|_|\__| \_/ |_(_)___/
[2025/11/11 21:49:12.454217350] [ info] [fluent bit] version=4.1.1, commit=, pid=7654
[2025/11/11 21:49:12.454345650] [ info] [storage] ver=1.5.3, type=memory, sync=normal, checksum=off, max_chunks_up=128
[2025/11/11 21:49:12.454355937] [ info] [simd ] SSE2
[2025/11/11 21:49:12.454363428] [ info] [cmetrics] version=1.0.5
[2025/11/11 21:49:12.454371187] [ info] [ctraces ] version=0.6.6
[2025/11/11 21:49:12.454441883] [ info] [input:tcp:tcp.0] initializing
[2025/11/11 21:49:12.454450891] [ info] [input:tcp:tcp.0] storage_strategy='memory' (memory only)
[2025/11/11 21:49:12.455168829] [ info] [sp] stream processor started
[2025/11/11 21:49:12.455347140] [ info] [output:stdout:stdout.0] worker #0 started
[2025/11/11 21:49:12.455396357] [ info] [engine] Shutdown Grace Period=5, Shutdown Input Grace Period=2
"}] tcp.0: [[1762897752.520261984, {}], {"log"=>"POST /services/collector HTTP/1.1
"}] tcp.0: [[1762897752.520272277, {}], {"log"=>"Host: 192.168.141.95:9880
"}] tcp.0: [[1762897752.520273812, {}], {"log"=>"User-Agent: OpenTelemetry Collector Contrib/11f9362e
"}] tcp.0: [[1762897752.520275124, {}], {"log"=>"Content-Length: 44970
"}] tcp.0: [[1762897752.520276343, {}], {"log"=>"Authorization: Splunk my_token
"}] tcp.0: [[1762897752.520277527, {}], {"log"=>"Connection: keep-alive
"}] tcp.0: [[1762897752.520278816, {}], {"log"=>"Content-Encoding: gzip
"}] tcp.0: [[1762897752.520280153, {}], {"log"=>"Content-Type: application/json
"}] tcp.0: [[1762897752.520281350, {}], {"log"=>"__splunk_app_name: OpenTelemetry Collector Contrib
"}] tcp.0: [[1762897752.520282527, {}], {"log"=>"__splunk_app_version:
"}]] tcp.0: [[1762897752.520283955, {}], {"log"=>"Accept-Encoding: gzip
"}]] tcp.0: [[1762897752.520285037, {}], {"log"=>"Connection: close
"}]] tcp.0: [[1762897752.520286276, {}], {"log"=>"Michael Marshall
11/11/2025, 9:52 PMVictor Nilsson
11/12/2025, 2:02 PM---
pipeline:
inputs:
- name: systemd
tag: systemd.*
read_from_tail: on
threaded: true
lowercase: on
db: /fluent-bit/db/systemd.db
storage.type: memory # Filesystem buffering is not needed for tail input since the files are stored locally.
mem_buf_limit: 250M
alias: in_systemd
We have set db as well as read_from_tail: on so our thoughts were that the fluent-bit container should not resend already processed logs, is this true?Andrew Elwell
11/13/2025, 2:31 AMMichael Marshall
11/13/2025, 3:27 PMDennyF
11/13/2025, 3:44 PMMegha Aggarwal
11/13/2025, 7:18 PMGabriel Alacchi
11/13/2025, 10:02 PMstorage.type=filesystem we see a rapid leak of memory in-use by the fluent-bit pod in k8s. Growing to as much as 16GB after 1d or so without a pod restart. We see that fluent-bit itself is not consuming much memory, maybe a few hundred MB, but rather that kernel slab associated to the container cgroup is accounting for all of the excess memory. slabtop claims that VFS dentry cache is accounting for all of those leaked kernel objects.
The behavior we're seeing is that since we are running buffering a large # of chunks/sec, we are creating easily hundreds of chunk files per second which leaks dentry entries rather quickly. Even upon file deletion the kernel will keep negative dentries which cache the non-existence of a file, and they aren't purged from kernel cache all that easily unless the system is under memory pressure. More context on this topic: https://lwn.net/Articles/894098/
Is this dentry cache bloat a well-known problem in the fluent-bit community? Are there good solutions / workarounds?
Some workarounds we've considered, but are looking for guidance from maintainers & community:
1. Raise VFS cache pressure on the nodes. I'm not 100% sure on how much this changes VFS cache behavior here, and not sure what perf consequences this can have on the rest of workloads on the Node. It's worth experimenting with.
2. Periodically reboot fluent-bit pods. This resets its memory accounting, however doesn't actually clean up the bloat in the dentry cache since it's a system wide cache. If our system gets into memory pressure, the sheer volume of dentry entries could lock-up the system. Feels like sweeping a bigger problem under the rug.
3. Periodically migrate fluent-bit storage directory to another directory and delete the old directory. Supposedly when a directory is deleted, a negative dentry is kept for it, but nested entries are pruned since they are now made redundant. I think this is the most plausible option since we can add a script wrapper around fluent-bit to gracefully shut it down, reconfigure, and re-start, no code changes are required in fluent-bit itself. How do we handle periods of backpressure when there is an existing backlog of chunks?
One idea to improve things within fluent-bit itself would be to re-use chunk file names so those cached dentries can be re-used. Either that, or use larger pre-allocated files with block-arena like memory management to store FS chunks. This may be more efficient? You can always add more files or extend the block arena if the FS storage buffer needs to grow.
CC @Pandu AjiRafael Martinez Guerrero
11/14/2025, 1:50 PMPhil Wilkins
11/14/2025, 9:43 PM