When dealing with log management, obvious solutions emerge earlier, often before you even discussed the purpose of the log management. We can discuss about ELK, Graylog, Splunk, … Those are great tools but they may not fit will all your needs. Lately, I had to work for one of my customer on enforcing log management for billing purposed. Rsyslog was already set to collect and centralize all the logs (and manage their backups). MongoDB seemed as a perfect tips for storing JSON extract of the logs to generate the proper stats.
Invoice Management - DronaHQ
JSON is standard
From json.org:
JSON (JavaScript Object Notation) is a lightweight data-interchange format. It is easy for humans to read and write. It is easy for machines to parse and generate. It is based on a subset of the JavaScript Programming Language Standard ECMA-262 3rd Edition - December 1999. JSON is a text format that is completely language independent but uses conventions that are familiar to programmers of the C-family of languages, including C, C++, C#, Java, JavaScript, Perl, Python, and many others. These properties make JSON an ideal data-interchange language.
On top of that format, we can used a more structured syntax, adapted to log management: CEE Log Syntax. This defined format is well used by various tools you might use, in the present case: nginx, rsyslog, and mongodb. Up to a certain level…
Nginx knows about @cee
Sort of. Nginx has a native syslog export for both error_log and access_log. Out of the box, it will simply push all logs from the default log formats to syslog. This won’t be very convenient for proper management afterwards. But nginx is easy to use, and you can define your own custom format. You’ll find plenty of blogposts, tutorials, and others as Any IT here? Help me! describing the proper @cee JSON log format you need:
They all give you the very same structure. Not sure who started it, but that’s not the point: they’re all wrong about it!
I am this close - Terminator 2
They are close, indeed, but some details have to be fixed first. Why are they wrong? Because JSON, and especially the @cee version of it, has datatypes. Numbers (integers and floats) should not be encapsulated between quotes: quotes are for string only.
Now we are good to go. Then, as all documentations state it, we can use this log format to push logs to syslog:
For sure, you can adapt the faciliy, the tag and the severity of it, along with the log format name.
Rsyslog is great. Its documentation is not.
That’s a fact, technical documentation, made by technical individuals, for technical persons, are not the best. Rsyslog might be one of the best example about this statement. Rainer Gerhards did a great work with his tool, one of the best syslog manager in my humble opinion. Thought, the documentation is a hell to read. Add to this fact, that it not as accurate as it should be.
In the use case I was working on, the infrastructure is benefiting from rsyslog v8-sable. And as stated, we want to use the mongodb exporter, aka ommongodb. Seems easy and straight forward? Almost. But remember this IT mantra:
If all goes well, we forget something.
For reasons out of my mind while writing down this post, I can tell you that it just do not work. Several aspects are to be considering before dealing with those logs.
- You need to ensure about the encoding: BSON (the JSON variant mongodb uses internally) only supports valid UTF-8 character. For that, you need to properly fix the encoding before sending them. There is a module for that.
- The structured @cee sent using the default configuration to mongodb will just push data “as-is”: first issue and main issue you’ll hit is about the date which will be pushed as a regular string.
But why? - Ryan Reynolds
Why? Simply because rsyslog only understand numbers and strings as datatypes on one end, and mongodb doesn’t auto detect date and timestamps on the other end. Is it an issue at the end? If you want to benefit from mongodb filtering features on dates, yes it is. For that purpose, you need to use the ISODate() functions that mongodb only knows about.
After a tremendous number of attempts, trying to deal with the documentation to find the proper format, I decided to read the ommongodb module source code. Pretty easy as it’s a well written C code:
You’ve just read it well: the code does not expect the fieldname to start with a date
or time
, but it expect the fieldname to be date
or time
.
Two solutions there:
- either update the log format, in nginx configuration, from
time_iso8601:
todate:
- update your rsyslog configuration
I’m not found of the string exporter used in the default documentation of rsyslog for mongodb: we’re using JSON as an input, we expect JSON at the output, why should we use strings in between?
For that reason, I moved to a JSON manipulation, thanks to the JSON parse module, and list type for the template:
Now we can talk. - Kyle Maclachlan - Twin Peaks
That’s not all folks!
Following that changes, nginx logs are pushed to mongodb, allowing easy statistics aggregation for billing purpose. As we just want to push these lines to mongodb, best is to proceed with something like:
The uristr
is a bit different from the the documentation because, once again, the document is not really explicit enough about it. For most real-life scenarios, even if you use a mongodb cluster, you want to rely on dedicated database and dedicated user, with the proper set of permissions. To benefit from it, you need to add some details within the uristr
as the user:password
but also some query parameters:
authSource
: the db to rely on for authentication, as you user has only permissons on itauthMechanism
: you’re pushing a password via the dsn
At the end
We went through. We’ve set up nginx to push JSON @cee-compliant logs to syslog, then we prepare the logs to be properly pushed to mongodb, and we publish them.
Now our folks can run their micro-batching to generate live billing and usage statistics for the customers.
How would you have tackled this kind of need? Did you suffer from technical documentations not adequate or not uptodate?