Let’s start with a common situation.
You’ve just created a new application in node.js. The application uses some SQL database, redis for cache, Kafka as a broker, and many many other useful things.
You are ready to deploy your masterpiece on production, so you choose your host provider, set up a domain, and finally - deploy the application. Everything works fine so far.
But after a few days, you have calls and emails from your clients. The application has started to work slower and slower, you are taking a look at your infrastructure metrics and you catch memory and CPU spikes, but when you take a look at APM you see no usual spikes in requests, everything looks the same as yesterday. At this point, you are blind.
This hypothetical case sounds scary, but usually, restarting fixes the problem and you can buy some time to investigate what actually happens.
We could easily avoid this situation, partially, if we had enabled profiling on production. But what the heck is profiling? Let’s get familiar with the definition
Profiling, in a general sense, refers to the process of monitoring and analyzing how a computer program utilizes CPU and memory resources. It is a crucial technique for identifying performance bottlenecks, optimizing code, and enhancing the efficiency of software applications during the development and testing phases.
So in general, we want to track how memory and processor are used by our application and which part of our applications has a problem like
- Memory leaks
- Processor abuse
- Slow connections
- I/O bottlenecks
- Ineffective caching
- Third-party library issues
Let’s see what kind of tools can we use in production to measure and profile the application
Google Profiler
Cloud Profiler is a statistical, low-overhead profiler that continuously gathers CPU usage and memory-allocation information from your production applications. It attributes that information to the source code that generated it, helping you identify the parts of your application that are consuming the most resources, and otherwise illuminating your application’s performance characteristics.
Google Cloud Profiler has been a good friend of mine when it comes to finding what function or class has been naughty.
But as with any tool, it has some flaws, like showing you… too much
Whole application heap
But, at the end of the day, it does its job by helping you find your code and 3rd library usage
(Usage of processor - wall time)
(Usage of memory - heap)
Ability to see top CPU and memory functions
Ability to compare profiles in time and by versions
It’s basic, but it works. You can see what function or 3rd library is after your resources and debug them locally (For example, by using ClinicJS - https://clinicjs.org/)
Pros:
- It’s cheap, you can use it in as many instances as you want
- Easy to install
- You only need to install this as a package and include it in your main file
Cons:
- UI is very laggy
- Wall time is sometimes not useful without many filters
- Lack of correlations with infrastructure
- Capture profile every 10 minutes
- Higher memory usage because it’s stores temporary profiles in application memory
Link to Google Cloud Profiler https://cloud.google.com/profiler/docs/about-profiler
DataDog
DataDog is a far bigger player when it comes to APM and infrastructure monitoring as well as when it comes to profiling applications. To be honest, DataDog is my current top 1 tool for profiling in the Node.JS environment.
And there is a simple explanation for that - they understand how Node.JS works and adjust profiling to show you what you really need to see
At first, they allow you to see the timeline for wall time and heap usage, so you can spot that something is off at first glance.
Secondly, they allow you to see only your code. Moreover, you can see at what point in time the profile has been captured
What I also find quite useful, is that the UI is not laggy at all, even when you have hundreds of services on the UI
Pros:
- Smart and fast UI
- Capture profiler every minute
- Correlations with infrastructure, logs, etc.
- Adjusted to Node.JS needs, you only see what’s important
Cons:
- Kinda expensive, 48 USD per host + 18 USD per infra
- You need to install the agent on the infrastructure and then install the package in your code in order to communicate with the agent
Lint to DataDog profiler: https://docs.datadoghq.com/profiler/
So as you can see, those tools can help you drastically reduce the time of debugging and make your application and clients happy again.