问题背景
25.12LTS版本跑了1200次OS的Reboot测试(约5分半一圈),出现告警The data written to the NAND flash in last 15 days exceeds 12G
查看每天写入量达到4个多GB
openUBMC:/->mdbctl lsprop NandFlash_00 bmc.kepler.Managers.NandFlash
bmc.kepler.Managers.NandFlash
EraseCountPerDays=[2,1,2,2,1,1]
LifeTime=89
LifeTimeThreshold=10
MaxEraseCount=3000
OverWrite=1
PreEOL=1
RWTestState=0
RemainingEraseCount=2690
ReservedBlocks=90
RuntimeBadBlocks=0
TotalWriteAmount=21.21
TotalWriteAmountIn15Days=21.21
Vendor="Hynix"
WriteAmountPerDay=[4.52,4.506,4.57,4.34,2.961,0.314]
WriteProtection=0
WriteThreshold=12
初步分析
使用BMC NandFlash写入量过大问题的定位思路分享 - 交流互助区 - openUBMC 论坛的分析思路可以定位到主要是framework的数据库写入导致的
~ /opt/bmc/apps/hica # sqlite3 /data/trust/persistence/per_poweroff.db .dbinfo
database page size: 1024
write format: 1
read format: 1
reserved bytes: 0
file change counter: 120134
database page count: 779
freelist page count: 7
schema cookie: 4
schema format: 4
default cache size: 0
autovacuum top root: 0
incremental vacuum: 0
text encoding: 1 (utf8)
user version: 0
application id: 0
software version: 3042000
number of tables: 3
number of indexes: 3
number of triggers: 0
number of views: 0
schema size: 655
data version 2
~ /opt/bmc/apps/hica # sqlite3 /opt/bmc/pram/persistence.local/maca.db .dbinfo
database page size: 4096
write format: 1
read format: 1
reserved bytes: 0
file change counter: 507668
database page count: 7
freelist page count: 0
schema cookie: 3
schema format: 4
default cache size: 0
autovacuum top root: 0
incremental vacuum: 0
text encoding: 1 (utf8)
user version: 0
application id: 0
software version: 3042000
number of tables: 3
number of indexes: 3
number of triggers: 0
number of views: 0
schema size: 432
data version 1
~ /opt/bmc/apps/hica # sqlite3 /opt/bmc/pram/persistence/per_reset.db .dbinfo
database page size: 4096
write format: 1
read format: 1
reserved bytes: 0
file change counter: 37564
database page count: 76
freelist page count: 0
schema cookie: 2
schema format: 4
default cache size: 0
autovacuum top root: 0
incremental vacuum: 0
text encoding: 1 (utf8)
user version: 0
application id: 0
software version: 3042000
number of tables: 2
number of indexes: 2
number of triggers: 0
number of views: 0
schema size: 451
data version 1
写入量最大的三个数据库是per_poweroff.db 12万次,maca.db 50万次,per_reset.db 约4万次
maca.db结构简单,写入频繁的原因比较好定位,另外两个不好定位
maca.db
sqlite> .tables
t_maca_service t_system_reset t_watchdog
sqlite> select * from t_maca_service;
sqlite> select * from t_system_reset;
sqlite> select * from t_watchdog;
Id LastWatchdogTime
-- ----------------
1 1778049279
sqlite> select * from t_watchdog;
Id LastWatchdogTime
-- ----------------
1 1778049280
sqlite> select * from t_watchdog;
Id LastWatchdogTime
-- ----------------
1 1778049283
可以发现就是maca组件每一秒记录一次喂狗时间导致的,感觉可以优化一下,每写一次都是至少4K的写入量。
求助内容
目前的数据库写入机制在频繁重启测试的情况下是不满足每天写入量低于800M的限制的,想确认一下这是否是正常现象,是否需要优化数据库写入
