Scott Bisker
11/07/2025, 6:19 PMJason A
11/07/2025, 6:21 PMJosh
11/07/2025, 6:52 PMGmo1492
11/07/2025, 7:06 PMCelalettin
11/07/2025, 7:23 PMSaksham
11/10/2025, 8:23 AMBryson Edwards
11/10/2025, 10:49 PM{
"namespace": "test"
}
i would want:
{
"some_new_field": "some_new_value",
"spec": {
"namespace": "test"
}
}Michael Marshall
11/11/2025, 9:51 PMPost "<http://192.168.141.95:9880/services/collector>": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
I have tried lots of options and configurations, but my current one is cli based:
root@ip-192-168-141-95:~# /opt/fluent-bit/bin/fluent-bit -i splunk -p port=9880 -p buffer_chunk_size=1024 -p buffer_max_size=32M -p tag=splunk.logs -p net.io_timeout=300s -o stdout -p match=splunk.logs -vv
which is producing:
[2025/11/11 21:44:09.381347930] [trace] [io] connection OK
[2025/11/11 21:44:09.381397730] [trace] [sched] 0 timer coroutines destroyed
[2025/11/11 21:44:09.381863699] [trace] [io coro=(nil)] [net_read] try up to 1024 bytes
[2025/11/11 21:44:09.381894442] [trace] [io coro=(nil)] [net_read] ret=1024
[2025/11/11 21:44:09.382594157] [trace] [io coro=(nil)] [net_read] try up to 1024 bytes
[2025/11/11 21:44:09.382625300] [trace] [io coro=(nil)] [net_read] ret=1024
[2025/11/11 21:44:09.382642132] [trace] [io coro=(nil)] [net_read] try up to 1024 bytes
[2025/11/11 21:44:09.382657844] [trace] [io coro=(nil)] [net_read] ret=1024
[2025/11/11 21:44:09.382674014] [trace] [io coro=(nil)] [net_read] try up to 1024 bytes
[2025/11/11 21:44:09.382684183] [trace] [io coro=(nil)] [net_read] ret=1024
[2025/11/11 21:44:09.382700140] [trace] [io coro=(nil)] [net_read] try up to 1024 bytes
[2025/11/11 21:44:09.382710296] [trace] [io coro=(nil)] [net_read] ret=1024
[2025/11/11 21:44:09.382724162] [trace] [io coro=(nil)] [net_read] try up to 1024 bytes
[2025/11/11 21:44:09.382734559] [trace] [io coro=(nil)] [net_read] ret=1024
[2025/11/11 21:44:09.382748716] [trace] [io coro=(nil)] [net_read] try up to 1024 bytes
[2025/11/11 21:44:09.382759216] [trace] [io coro=(nil)] [net_read] ret=1024
[2025/11/11 21:44:09.382772032] [trace] [io coro=(nil)] [net_read] try up to 1024 bytes
[2025/11/11 21:44:09.382782780] [trace] [io coro=(nil)] [net_read] ret=1024
[2025/11/11 21:44:09.382796156] [trace] [io coro=(nil)] [net_read] try up to 1024 bytes
[2025/11/11 21:44:09.382805907] [trace] [io coro=(nil)] [net_read] ret=1024
[2025/11/11 21:44:09.382818906] [trace] [io coro=(nil)] [net_read] try up to 1024 bytes
[2025/11/11 21:44:09.382828814] [trace] [io coro=(nil)] [net_read] ret=1024
[2025/11/11 21:44:09.382843934] [trace] [sched] 0 timer coroutines destroyed
[2025/11/11 21:44:09.382853034] [trace] [io coro=(nil)] [net_read] try up to 1024 bytes
[2025/11/11 21:44:09.382863254] [trace] [io coro=(nil)] [net_read] ret=1024
[2025/11/11 21:44:09.382878776] [trace] [io coro=(nil)] [net_read] try up to 1024 bytes
[2025/11/11 21:44:09.382888383] [trace] [io coro=(nil)] [net_read] ret=1024
[2025/11/11 21:44:09.382908014] [trace] [io coro=(nil)] [net_read] try up to 1024 bytes
[2025/11/11 21:44:09.382918664] [trace] [io coro=(nil)] [net_read] ret=1024
[2025/11/11 21:44:09.382933485] [trace] [io coro=(nil)] [net_read] try up to 1024 bytes
[2025/11/11 21:44:09.382943435] [trace] [io coro=(nil)] [net_read] ret=1024
[2025/11/11 21:44:09.382961527] [trace] [io coro=(nil)] [net_read] try up to 1024 bytes
[2025/11/11 21:44:09.382972431] [trace] [io coro=(nil)] [net_read] ret=1024
[2025/11/11 21:44:09.382990641] [trace] [io coro=(nil)] [net_read] try up to 1024 bytes
[2025/11/11 21:44:09.383000808] [trace] [io coro=(nil)] [net_read] ret=1024
[2025/11/11 21:44:09.383026942] [trace] [io coro=(nil)] [net_read] try up to 1024 bytes
[2025/11/11 21:44:09.383042965] [trace] [io coro=(nil)] [net_read] ret=1024
[2025/11/11 21:44:09.383060552] [trace] [io coro=(nil)] [net_read] try up to 1024 bytes
[2025/11/11 21:44:09.383070761] [trace] [io coro=(nil)] [net_read] ret=1024
[2025/11/11 21:44:09.383085467] [trace] [io coro=(nil)] [net_read] try up to 1024 bytes
[2025/11/11 21:44:09.383097179] [trace] [io coro=(nil)] [net_read] ret=1024
[2025/11/11 21:44:09.383111593] [trace] [io coro=(nil)] [net_read] try up to 1024 bytes
[2025/11/11 21:44:09.383120958] [trace] [io coro=(nil)] [net_read] ret=1024
[2025/11/11 21:44:09.383137668] [trace] [sched] 0 timer coroutines destroyed
[2025/11/11 21:44:09.383146008] [trace] [io coro=(nil)] [net_read] try up to 1024 bytes
[2025/11/11 21:44:09.383157180] [trace] [io coro=(nil)] [net_read] ret=1024
[2025/11/11 21:44:09.383170359] [trace] [io coro=(nil)] [net_read] try up to 1024 bytes
[2025/11/11 21:44:09.383179843] [trace] [io coro=(nil)] [net_read] ret=1024
[2025/11/11 21:44:09.383193275] [trace] [io coro=(nil)] [net_read] try up to 1024 bytes
[2025/11/11 21:44:09.383203629] [trace] [io coro=(nil)] [net_read] ret=706
[2025/11/11 21:44:09.383216611] [trace] [sched] 0 timer coroutines destroyed
[2025/11/11 21:44:09.431509514] [trace] [sched] 0 timer coroutines destroyed
[2025/11/11 21:44:09.681537238] [trace] [sched] 0 timer coroutines destroyed
[2025/11/11 21:44:09.681554644] [trace] [sched] 0 timer coroutines destroyed
[2025/11/11 21:44:09.879452869] [trace] [io coro=(nil)] [net_read] try up to 1024 bytes
[2025/11/11 21:44:09.879531281] [trace] [io coro=(nil)] [net_read] ret=0
[2025/11/11 21:44:09.879549725] [trace] [downstream] destroy connection #48 to <tcp://192.168.141.95:46304>
[2025/11/11 21:44:09.879621135] [trace] [sched] 0 timer coroutines destroyed
[2025/11/11 21:44:09.931509675] [trace] [sched] 0 timer coroutines destroyed
[2025/11/11 21:44:10.95119333] [trace] [io coro=(nil)] [net_read] try up to 1024 bytes
[2025/11/11 21:44:10.95162342] [trace] [io coro=(nil)] [net_read] ret=0
[2025/11/11 21:44:10.95179536] [trace] [downstream] destroy connection #49 to <tcp://192.168.141.95:46314>
[2025/11/11 21:44:10.95247475] [trace] [sched] 0 timer coroutines destroyed
[2025/11/11 21:44:10.181511800] [trace] [sched] 0 timer coroutines destroyed
[2025/11/11 21:44:10.431508128] [trace] [sched] 0 timer coroutines destroyed
[2025/11/11 21:44:10.681546565] [trace] [sched] 0 timer coroutines destroyed
[2025/11/11 21:44:10.681585263] [trace] [sched] 0 timer coroutines destroyed
[2025/11/11 21:44:10.931508179] [trace] [sched] 0 timer coroutines destroyed
[2025/11/11 21:44:11.181514100] [trace] [sched] 0 timer coroutines destroyed
[2025/11/11 21:44:11.431510732] [trace] [sched] 0 timer coroutines destroyed
[2025/11/11 21:44:11.681539544] [trace] [sched] 0 timer coroutines destroyed
[2025/11/11 21:44:11.931508704] [trace] [sched] 0 timer coroutines destroyed
[2025/11/11 21:44:12.173087049] [trace] [io] connection OK
[2025/11/11 21:44:12.173199150] [trace] [sched] 0 timer coroutines destroyed
[2025/11/11 21:44:12.173810559] [trace] [io coro=(nil)] [net_read] try up to 1024 bytes
[2025/11/11 21:44:12.173841862] [trace] [io coro=(nil)] [net_read] ret=1024
[2025/11/11 21:44:12.173872772] [trace] [io coro=(nil)] [net_read] try up to 1024 bytes
[2025/11/11 21:44:12.173883888] [trace] [io coro=(nil)] [net_read] ret=1024
[2025/11/11 21:44:12.173898853] [trace] [io coro=(nil)] [net_read] try up to 1024 bytes
[2025/11/11 21:44:12.173909280] [trace] [io coro=(nil)] [net_read] ret=1024
[2025/11/11 21:44:12.173923156] [trace] [io coro=(nil)] [net_read] try up to 1024 bytes
[2025/11/11 21:44:12.173933024] [trace] [io coro=(nil)] [net_read] ret=1024
[2025/11/11 21:44:12.173946124] [trace] [io coro=(nil)] [net_read] try up to 1024 bytes
[2025/11/11 21:44:12.173955163] [trace] [io coro=(nil)] [net_read] ret=1024
[2025/11/11 21:44:12.173967800] [trace] [io coro=(nil)] [net_read] try up to 1024 bytes
[2025/11/11 21:44:12.173977479] [trace] [io coro=(nil)] [net_read] ret=1024
[2025/11/11 21:44:12.173989628] [trace] [io coro=(nil)] [net_read] try up to 1024 bytes
[2025/11/11 21:44:12.173999144] [trace] [io coro=(nil)] [net_read] ret=1024
[2025/11/11 21:44:12.174096070] [trace] [io coro=(nil)] [net_read] try up to 1024 bytes
[2025/11/11 21:44:12.174110901] [trace] [io coro=(nil)] [net_read] ret=1024
[2025/11/11 21:44:12.174203854] [trace] [io coro=(nil)] [net_read] try up to 1024 bytes
[2025/11/11 21:44:12.174395146] [trace] [io coro=(nil)] [net_read] ret=1024
[2025/11/11 21:44:12.174415522] [trace] [io coro=(nil)] [net_read] try up to 1024 bytes
[2025/11/11 21:44:12.174426114] [trace] [io coro=(nil)] [net_read] ret=1024
[2025/11/11 21:44:12.174435379] [trace] [sched] 0 timer coroutines destroyed
[2025/11/11 21:44:12.174441221] [trace] [io coro=(nil)] [net_read] try up to 1024 bytes
[2025/11/11 21:44:12.174447781] [trace] [io coro=(nil)] [net_read] ret=314
[2025/11/11 21:44:12.174457878] [trace] [sched] 0 timer coroutines destroyed
[2025/11/11 21:44:12.181508649] [trace] [sched] 0 timer coroutines destroyed
[2025/11/11 21:44:12.181507560] [trace] [sched] 0 timer coroutines destroyed
[2025/11/11 21:44:12.430735078] [trace] [io coro=(nil)] [net_read] try up to 1024 bytes
[2025/11/11 21:44:12.430779926] [trace] [io coro=(nil)] [net_read] ret=0
[2025/11/11 21:44:12.430796710] [trace] [downstream] destroy connection #52 to <tcp://192.168.141.95:46322>
[2025/11/11 21:44:12.430866695] [trace] [sched] 0 timer coroutines destroyed
[2025/11/11 21:44:12.431506047] [trace] [sched] 0 timer coroutines destroyed
[2025/11/11 21:44:12.681535932] [trace] [sched] 0 timer coroutines destroyed
Any ideas?
When i switched it to tcp, i get:
______ _ _ ______ _ _ ___ __
| ___| | | | | ___ (_) | / | / |
| |_ | |_ _ ___ _ __ | |_ | |_/ /_| |_ __ __/ /| | `| |
| _| | | | | |/ _ \ '_ \| __| | ___ \ | __| \ \ / / /_| | | |
| | | | |_| | __/ | | | |_ | |_/ / | |_ \ V /\___ |__| |_
\_| |_|\__,_|\___|_| |_|\__| \____/|_|\__| \_/ |_(_)___/
[2025/11/11 21:49:12.454217350] [ info] [fluent bit] version=4.1.1, commit=, pid=7654
[2025/11/11 21:49:12.454345650] [ info] [storage] ver=1.5.3, type=memory, sync=normal, checksum=off, max_chunks_up=128
[2025/11/11 21:49:12.454355937] [ info] [simd ] SSE2
[2025/11/11 21:49:12.454363428] [ info] [cmetrics] version=1.0.5
[2025/11/11 21:49:12.454371187] [ info] [ctraces ] version=0.6.6
[2025/11/11 21:49:12.454441883] [ info] [input:tcp:tcp.0] initializing
[2025/11/11 21:49:12.454450891] [ info] [input:tcp:tcp.0] storage_strategy='memory' (memory only)
[2025/11/11 21:49:12.455168829] [ info] [sp] stream processor started
[2025/11/11 21:49:12.455347140] [ info] [output:stdout:stdout.0] worker #0 started
[2025/11/11 21:49:12.455396357] [ info] [engine] Shutdown Grace Period=5, Shutdown Input Grace Period=2
"}] tcp.0: [[1762897752.520261984, {}], {"log"=>"POST /services/collector HTTP/1.1
"}] tcp.0: [[1762897752.520272277, {}], {"log"=>"Host: 192.168.141.95:9880
"}] tcp.0: [[1762897752.520273812, {}], {"log"=>"User-Agent: OpenTelemetry Collector Contrib/11f9362e
"}] tcp.0: [[1762897752.520275124, {}], {"log"=>"Content-Length: 44970
"}] tcp.0: [[1762897752.520276343, {}], {"log"=>"Authorization: Splunk my_token
"}] tcp.0: [[1762897752.520277527, {}], {"log"=>"Connection: keep-alive
"}] tcp.0: [[1762897752.520278816, {}], {"log"=>"Content-Encoding: gzip
"}] tcp.0: [[1762897752.520280153, {}], {"log"=>"Content-Type: application/json
"}] tcp.0: [[1762897752.520281350, {}], {"log"=>"__splunk_app_name: OpenTelemetry Collector Contrib
"}] tcp.0: [[1762897752.520282527, {}], {"log"=>"__splunk_app_version:
"}]] tcp.0: [[1762897752.520283955, {}], {"log"=>"Accept-Encoding: gzip
"}]] tcp.0: [[1762897752.520285037, {}], {"log"=>"Connection: close
"}]] tcp.0: [[1762897752.520286276, {}], {"log"=>"Michael Marshall
11/11/2025, 9:52 PMVictor Nilsson
11/12/2025, 2:02 PM---
pipeline:
inputs:
- name: systemd
tag: systemd.*
read_from_tail: on
threaded: true
lowercase: on
db: /fluent-bit/db/systemd.db
storage.type: memory # Filesystem buffering is not needed for tail input since the files are stored locally.
mem_buf_limit: 250M
alias: in_systemd
We have set db as well as read_from_tail: on so our thoughts were that the fluent-bit container should not resend already processed logs, is this true?Andrew Elwell
11/13/2025, 2:31 AMMichael Marshall
11/13/2025, 3:27 PMDennyF
11/13/2025, 3:44 PMMegha Aggarwal
11/13/2025, 7:18 PMGabriel Alacchi
11/13/2025, 10:02 PMstorage.type=filesystem we see a rapid leak of memory in-use by the fluent-bit pod in k8s. Growing to as much as 16GB after 1d or so without a pod restart. We see that fluent-bit itself is not consuming much memory, maybe a few hundred MB, but rather that kernel slab associated to the container cgroup is accounting for all of the excess memory. slabtop claims that VFS dentry cache is accounting for all of those leaked kernel objects.
The behavior we're seeing is that since we are running buffering a large # of chunks/sec, we are creating easily hundreds of chunk files per second which leaks dentry entries rather quickly. Even upon file deletion the kernel will keep negative dentries which cache the non-existence of a file, and they aren't purged from kernel cache all that easily unless the system is under memory pressure. More context on this topic: https://lwn.net/Articles/894098/
Is this dentry cache bloat a well-known problem in the fluent-bit community? Are there good solutions / workarounds?
Some workarounds we've considered, but are looking for guidance from maintainers & community:
1. Raise VFS cache pressure on the nodes. I'm not 100% sure on how much this changes VFS cache behavior here, and not sure what perf consequences this can have on the rest of workloads on the Node. It's worth experimenting with.
2. Periodically reboot fluent-bit pods. This resets its memory accounting, however doesn't actually clean up the bloat in the dentry cache since it's a system wide cache. If our system gets into memory pressure, the sheer volume of dentry entries could lock-up the system. Feels like sweeping a bigger problem under the rug.
3. Periodically migrate fluent-bit storage directory to another directory and delete the old directory. Supposedly when a directory is deleted, a negative dentry is kept for it, but nested entries are pruned since they are now made redundant. I think this is the most plausible option since we can add a script wrapper around fluent-bit to gracefully shut it down, reconfigure, and re-start, no code changes are required in fluent-bit itself. How do we handle periods of backpressure when there is an existing backlog of chunks?
One idea to improve things within fluent-bit itself would be to re-use chunk file names so those cached dentries can be re-used. Either that, or use larger pre-allocated files with block-arena like memory management to store FS chunks. This may be more efficient? You can always add more files or extend the block arena if the FS storage buffer needs to grow.
CC @Pandu AjiRafael Martinez Guerrero
11/14/2025, 1:50 PMPhil Wilkins
11/14/2025, 9:43 PMAndrew Elwell
11/16/2025, 10:05 PMservice:
http_server: on
http_listen: 0.0.0.0
http_port: 2020
https://docs.fluentbit.io/manual/administration/monitoring
vs
https://docs.fluentbit.io/manual/data-pipeline/inputs/fluentbit-metrics and sending those to a prometheus_exporter as described in the docs?
Does one have more things? better coverage of pipeline/ consume fewer resources?Andrew Elwell
11/16/2025, 10:35 PM/metrics - is this expected?
[aelwell@admiral ~]$ curl -s <http://127.0.0.1:2021/blah/randomshit../../../../../../../../metrics> | head
# HELP fluentbit_uptime Number of seconds that Fluent Bit has been running.
# TYPE fluentbit_uptime counter
fluentbit_uptime{hostname="admiral"} 122936
# HELP fluentbit_logger_logs_total Total number of logs
# TYPE fluentbit_logger_logs_total counter
fluentbit_logger_logs_total{message_type="error"} 0
fluentbit_logger_logs_total{message_type="warn"} 0
fluentbit_logger_logs_total{message_type="info"} 20
fluentbit_logger_logs_total{message_type="debug"} 0
fluentbit_logger_logs_total{message_type="trace"} 0
ah, there's a choice of only
static void cb_metrics(mk_request_t *request, void *data)
static void cb_root(mk_request_t *request, void *data)Sagi Rosenthal
11/17/2025, 9:30 AMcd build && cmake .. -DFLB_TESTS_RUNTIME=On && make and getting lib_crypto issues:
Undefined symbols for architecture arm64:
"_EVP_MD_size", referenced from:
_flb_hmac_init in flb_hmac.c.o
_flb_hash_init in flb_hash.c.o
"_EVP_PKEY_size", referenced from:
_flb_crypto_init in flb_crypto.c.o
ld: symbol(s) not found for architecture arm64
clang++: error: linker command failed with exit code 1 (use -v to see invocation)
make[2]: *** [lib/libfluent-bit.dylib] Error 1
make[1]: *** [src/CMakeFiles/fluent-bit-shared.dir/all] Error 2
make: *** [all] Error 2Megha Aggarwal
11/17/2025, 7:12 PMSanath Ramesh
11/18/2025, 9:00 AM[INPUT]
Name tail
Path /var/log/cloud-init.log
Buffer_Max_Size 128k
Mem_Buf_Limit 16384k
Skip_Long_Lines On
Path_Key filePath
Tag cloud-init.log
DB /var/db/newrelic-infra/newrelic-integrations/logging/fb.db
[INPUT]
Name tail
Path /var/log/messages
Buffer_Max_Size 128k
Mem_Buf_Limit 16384k
Skip_Long_Lines On
Path_Key filePath
Tag messages
DB /var/db/newrelic-infra/newrelic-integrations/logging/fb.db
[INPUT]
Name tail
Path /var/log/secure
Buffer_Max_Size 128k
Mem_Buf_Limit 16384k
Skip_Long_Lines On
Path_Key filePath
Tag secure
DB /var/db/newrelic-infra/newrelic-integrations/logging/fb.db
[INPUT]
Name tail
Path /var/log/yum.log
Buffer_Max_Size 128k
Mem_Buf_Limit 16384k
Skip_Long_Lines On
Path_Key filePath
Tag yum.log
DB /var/db/newrelic-infra/newrelic-integrations/logging/fb.db
[INPUT]
Name tail
Path /root/.newrelic/newrelic-cli.log
Buffer_Max_Size 128k
Mem_Buf_Limit 16384k
Skip_Long_Lines On
Path_Key filePath
Tag newrelic-cli.log
DB /var/db/newrelic-infra/newrelic-integrations/logging/fb.db
Before Agent stops (fb.db):
sqlite> select * from in_tail_files where name = '/var/log/test.log';
id|name|offset|inode|created|rotated
33|/var/log/test.log|85|12713160|1762270005|0
After Infrastructure Agent starts (fb.db):
After Agent starts
sqlite> select * from in_tail_files where name = '/var/log/test.log';
id|name|offset|inode|created|rotated
41|/var/log/test.log|114|12713160|1762271396|0
Please notice the change in created at time stamp
We tried the same on other Operating Systems and architectures but they all seem to be working fine.Phil Wilkins
11/18/2025, 11:59 AMCelalettin
11/18/2025, 12:53 PMDenis
11/19/2025, 7:58 AMAnders
11/19/2025, 8:08 AMUSER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 105637 0.3 1.1 133760 43556 ? Ssl 08:51 0:00 /opt/fluent-bit/bin/fluent-bit -c /etc/fluent-bit/fluent-bit.yaml
With systemd-input
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 105691 0.6 1.6 614252 63588 ? Ssl 08:52 0:00 /opt/fluent-bit/bin/fluent-bit -c /etc/fluent-bit/fluent-bit.yaml
systemd-input configuration
pipeline:
inputs:
...
- name: systemd
tag: 'log.systemd.*'
db: /var/spool/fluent-bit/logs.db
lowercase: true
strip_underscores: true
mem_buf_limit: 50MB
storage.type: memory
...DennyF
11/19/2025, 1:23 PMDennyF
11/19/2025, 1:25 PMdevsecops
11/19/2025, 1:50 PMDennyF
11/19/2025, 2:40 PMpipeline:
inputs:
- name: syslog
listen: 127.0.0.1
port: 5140
parser: syslog_rfc3339
tag: syslog
mode: udp
buffer_chunk_size: 32000
buffer_max_size: 64000
receive_buffer_size: 512000
?