When dealing with log management, obvious solutions emerge earlier, often before you even discussed the purpose of the log management. We can discuss about ELK, Graylog, Splunk, … Those are great tools but they may not fit will all your needs. Lately, I had to work for one of my customer on enforcing log management for billing purposed. Rsyslog was already set to collect and centralize all the logs (and manage their backups). MongoDB seemed as a perfect tips for storing JSON extract of the logs to generate the proper stats.
Invoice Management - DronaHQ
JSON is standard
On top of that format, we can used a more structured syntax, adapted to log management: CEE Log Syntax. This defined format is well used by various tools you might use, in the present case: nginx, rsyslog, and mongodb. Up to a certain level…
Nginx knows about @cee
Sort of. Nginx has a native syslog export for both error_log and access_log. Out of the box, it will simply push all logs from the default log formats to syslog. This won’t be very convenient for proper management afterwards. But nginx is easy to use, and you can define your own custom format. You’ll find plenty of blogposts, tutorials, and others as Any IT here? Help me! describing the proper @cee JSON log format you need:
They all give you the very same structure. Not sure who started it, but that’s not the point: they’re all wrong about it!
I am this close - Terminator 2
They are close, indeed, but some details have to be fixed first. Why are they wrong? Because JSON, and especially the @cee version of it, has datatypes. Numbers (integers and floats) should not be encapsulated between quotes: quotes are for string only.
Now we are good to go. Then, as all documentations state it, we can use this log format to push logs to syslog:
For sure, you can adapt the faciliy, the tag and the severity of it, along with the log format name.
Rsyslog is great. Its documentation is not.
That’s a fact, technical documentation, made by technical individuals, for technical persons, are not the best. Rsyslog might be one of the best example about this statement. Rainer Gerhards did a great work with his tool, one of the best syslog manager in my humble opinion. Thought, the documentation is a hell to read. Add to this fact, that it not as accurate as it should be.
In the use case I was working on, the infrastructure is benefiting from rsyslog v8-sable. And as stated, we want to use the mongodb exporter, aka ommongodb. Seems easy and straight forward? Almost. But remember this IT mantra:
If all goes well, we forget something.
For reasons out of my mind while writing down this post, I can tell you that it just do not work. Several aspects are to be considering before dealing with those logs.
- You need to ensure about the encoding: BSON (the JSON variant mongodb uses internally) only supports valid UTF-8 character. For that, you need to properly fix the encoding before sending them. There is a module for that.
- The structured @cee sent using the default configuration to mongodb will just push data “as-is”: first issue and main issue you’ll hit is about the date which will be pushed as a regular string.
But why? - Ryan Reynolds
Why? Simply because rsyslog only understand numbers and strings as datatypes on one end, and mongodb doesn’t auto detect date and timestamps on the other end. Is it an issue at the end? If you want to benefit from mongodb filtering features on dates, yes it is. For that purpose, you need to use the ISODate() functions that mongodb only knows about.
After a tremendous number of attempts, trying to deal with the documentation to find the proper format, I decided to read the ommongodb module source code. Pretty easy as it’s a well written C code:
You’ve just read it well: the code does not expect the fieldname to start with a
time, but it expect the fieldname to be
Two solutions there:
- either update the log format, in nginx configuration, from
- update your rsyslog configuration
I’m not found of the string exporter used in the default documentation of rsyslog for mongodb: we’re using JSON as an input, we expect JSON at the output, why should we use strings in between?
For that reason, I moved to a JSON manipulation, thanks to the JSON parse module, and list type for the template:
Now we can talk. - Kyle Maclachlan - Twin Peaks
That’s not all folks!
Following that changes, nginx logs are pushed to mongodb, allowing easy statistics aggregation for billing purpose. As we just want to push these lines to mongodb, best is to proceed with something like:
uristr is a bit different from the the documentation because, once again, the document is not really explicit enough about it. For most real-life scenarios, even if you use a mongodb cluster, you want to rely on dedicated database and dedicated user, with the proper set of permissions. To benefit from it, you need to add some details within the
uristr as the
user:password but also some query parameters:
authSource: the db to rely on for authentication, as you user has only permissons on it
authMechanism: you’re pushing a password via the dsn
At the end
We went through. We’ve set up nginx to push JSON @cee-compliant logs to syslog, then we prepare the logs to be properly pushed to mongodb, and we publish them.
Now our folks can run their micro-batching to generate live billing and usage statistics for the customers.
How would you have tackled this kind of need? Did you suffer from technical documentations not adequate or not uptodate?