Grafana
Repairing bitnami Grafana Database
Case study on how I repaired grafana.db
So, I had a bitnami/grafana
helm chart deployed on my k8s cluster.
Somehow our Grafana stopped responding. On capturing pod logs, we saw an error saying there was no space left on device.
Upon investigating size du -ah --max-depth=1
we got to know it was grafana.db
that was eating up the whole storage. Due to one alert rule defined, the Grafana database grew in size quickly which caused the disk to become full.
Grafana internally uses sqlite
as database to record different data like users, alerts, permissions etc.
One solution was to clean up and setup things again but I tried to fix the underlying issue else my custom dashboard configuration would get lost.
Hence I decided to purge some tables to free up some space.
I tried to find any inbuilt tool or command given by Grafana to clean up things, but couldn't. I even tried exploring grafana-cli
but no help
But somehow, I ended up frying my database, and the following error started coming
Error: database disk image is malformed
Upon searching, I found a way to backup sqlite using sqlite3
CLI which I can easily install using
apt update
apt install -y sqlite3
But wait. My Grafana pod didn't allow me to run this command as root. I tried to set security context in pod runAsUser: 0
but that didn't work. This didn't allow Grafana pod to start.
So upon chatgpt-ing
found a way to temp start a pod and mounting volume to install and access sqlite as root
apiVersion: v1
kind: Pod
metadata:
name: grafana-debug
namespace: grafana
spec:
containers:
- name: debug-container
image: bitnami/grafana:latest
command: [ "sleep", "infinity" ]
securityContext:
runAsUser: 0
volumeMounts:
- name: grafana-data
mountPath: /opt/bitnami/grafana/data
volumes:
- name: grafana-data
persistentVolumeClaim:
claimName: grafana
restartPolicy: Never
With this I first installed sqlite3
and then ran the following command
sqlite3 database.db ".dump" | sqlite3 database.new
This command allows you to create a new database by dumping data of the old one. This indeed have probability that some of your data could lost due to malformed transactions, but for me, it does the job. After this, I would rename database.new
to grafana.db
and for safety rename the old one to grafana-old.db
.
But there’s a catch, because my database was malformed, hence there was a ROLLBACK statement, somewhere at last, in the script which caused a whole new database to be of size 0.
To combat this, I came upon this stackover answer
sqlite3 database.db ".dump" | sed -e 's|^ROLLBACK;\( -- due to errors\)*$|COMMIT;|g' | sqlite3 database.new
This command adds a stream editor which replaces all ROLLBACK to COMMIT statement.
And guess what after some time, my database was fully copied. I renamed this as said above and my Grafana was UP and Running again.
Though this approach took a lot time, but I learned a lot about how to recover things after the disaster, which definitely would help me sometime in the future.
Whola! Both you and I learned something new today. Congrats
👏 👏 👏