GSD
How to Overcome the 5 Biggest Node.js Mistakes
Posted by Fernando Doglio on September 18, 2023
Table of contents
Node.js (or rather, JavaScript itself) is a very accessible language, requires little skill and there is a lot of documentation around to assist you. What's more, since it is such a foundation of web improvement, it’s a must-learn for anyone trying to become either a front-end developer or a fullstack dev.
That being said, the language itself has a couple of differences from other web-related languages (such as Python, PHP or even JAVA) that might confuse first-comers. In this article, I’ll be covering the top 5 most common problems that newcomers have with Node.js and how to work around them.
1. Blocking the event loop
Node.js is known for its asynchronous behavior, it’s one of it’s selling points after all. But that also means it can be confusing to understand how the event loop, a core construct for waiting and sending messages in a pattern, should be leveraged.
Typically, given the nature of the event loop, any type of async-based behavior is perfectly fine, since it’ll send it off in an internal thread and queues up the callback for when it finishes (this is an oversimplification of the process, but it’s enough to get the idea). This means anything I/O related works as expected and has no significant effect on our event loop.
But CPU intensive tasks, however, which can’t be offloaded with any kind of asynchronous interface do affect it and there are multiple operations that can block the event loop. For example:
- REDOS (Regular Expressions Denial of Service):
- Operations ending with Sync such as fs.readFileSync, crypto.randomFillSync, zlib.inflateSync, and so no. You get the point, they all work synchronously, so you’re not taking advantage of the power of the event loop.
- JSON operations with big files
Let me elaborate:
REDOS
These are particular regular expressions that because of how they’re written, result in heavy usage of the CPU. Simply put, we normally tend to think of RegExps as having a O(n) complexity, meaning they only need to do a single pass on the analyzed string. However, these particular kinds of regular expressions are built-in a way they end up having an O(2^n) complexity instead.
Sadly, it’s hard to realize when you’re dealing with a REDOS since many things can cause these expressions, so as a general rule of thumb, try to avoid the following cases:
- Avoid nested quantifiers like (a+)*. Node's regexp engine can handle some of these quickly, but others are vulnerable.
- Avoid OR's with overlapping clauses, like (a|a)*. Again, these are sometimes-fast.
- Avoid using backreferences, like (a.*) \1. No regexp engine can guarantee to evaluate these in linear time.
- If you're doing a simple string match, use indexOf or the local equivalent. It will be cheaper and will never take more than O(n).
It’s important to note that when testing for REDOS, matches don’t really show you the vulnerability, instead, it’s mismatches that you should be looking for, after all, Node’s engine can’t be sure the RegExp isn’t matching until it performs several passes on the string.
So, to illustrate this problem, think of the following expressions:
/(\/.+)+$/
It violates the #1 rule from above, so although with a string such as:
let str = "///"
The following code returns an immediate match:
str.match(/(\/.+)+$/)
If you add just a few characters at the end of the original string, it will cause a mismatch:
str = "///\ntest"
Also, adding a line break, which the “.” character will not match by default, will cause the event loop to get blocked. You can easily test it on Node’s REPL, just make sure you don’t have the laptop on top of your lap at the moment because the temperature is going to rise like crazy!
This vulnerability is particularly bad if you’re using a REDOS with user input since it becomes a security problem. So make sure you’re aware of them.
*Sync functions
These are very straightforward: as a general rule, if you’re worried about performance or the general state of your event loop, you should avoid working with methods that end on “Sync”. And that might be an obvious statement for some of you, but newcomers to Node might decide to go with these functions when they don’t get the asynchronous behavior.
If this is you, just keep in mind that Node.js is meant to take advantage of asynchronous behavior so using these helper methods and functions isn’t the way to go. It doesn’t matter if the native API (and other external modules as well) comes with them, you should try to avoid them.
Big JSON operations
JSON.stringify and JSON.parse are sometimes used without consideration to the size of the JSON you’re working with. We normally tend to assume these are fast methods, but in reality, they’re synchronous methods that were not designed with size in mind.
Here is a simple example of what I mean:
let obj = { a: 1 };
let size = 22;
let before, str, pos, res, took;
for (let i = 0; i < size; i++) {
obj = { obj1: obj, obj2: obj }; // Doubles in size each iteration
}
before = process.hrtime();
str = JSON.stringify(obj);
took = process.hrtime(before);
console.log('JSON.stringify took ' + took);
before = process.hrtime();
pos = str.indexOf('nomatch');
took = process.hrtime(before);
console.log('Pure indexof took ' + took, ' Size: ', Buffer.byteLength(str, 'utf8') / (1024 * 1024), ' Mb');
before = process.hrtime();
res = JSON.parse(str);
took = process.hrtime(before);
console.log('JSON.parse took ' + took);
The output from that script in my laptop is:
JSON.stringify took 1,227829893
Pure index of took 0,49462899 Size: 95.99998378753662 Mb
JSON.parse took 1,525785322
If I keep increasing the size variable, just by one, those numbers will increase considerably, so be careful with your tests. The point there being that you’re not completely safe using this function, you can and will block the event loop if you’re dealing with external or dynamically generated JSON that need to either be parsed or stringified.
If you want to be safe about it, please check out libraries such as JSONStream and BigFriendly JSON which has actually been designed with this in mind.
2. Callback confusion: calling the same callback several times
Leaving aside the event loop for a bit, another very common mistake that we’ve all done is to call our asynchronous callbacks several times within the same execution cycle.
Not referring to the actual function we set up as a callback to an async function, but rather, the callbacks we receive on a custom function we created.
Remember, this is Node.js and asynchronous behavior is king, so callbacks are ever-present:
Consider the following code:
const fs = require("fs")
function readAFile(fname, done) {
fs.readFile(fname, (err, cnt) => {
if(err) {
done("Error: There was a problem reading the file: ", err)
}
//.... processing the cnt returned
done(null, cnt)
})
}
At a first glance, it might be correct, and it will work as there is nothing syntactically wrong with it, however, the moment there is a problem reading the file, it will call both callbacks!
Now this is a very common error and it has a simple fix: either change the IF to an IF-ELSE statement, so you make sure either one of those gets called, or just add a return statement to each callback line:
const fs = require("fs")
function readAFile(fname, done) {
fs.readFile(fname, (err, cnt) => {
if(err) {
Return done("Error: There was a problem reading the file: ", err)
}
//.... processing the cnt returned
return done(null, cnt)
})
}
Now the problem is fixed!
Do note, however, that the return statement is not meant to signify an actual return you can use. This is an asynchronous function, after all, there is nothing you can get from assigning the execution of readFile into a variable. This is purely meant to let our interpreter know the execution of our code ends there, so nothing below that line should be executed.
There is another variation of this common problem, however, that is a bit more elaborate.
Event-based libraries usually allow you to set callbacks for particular events with an on method.
There are many libraries that follow this pattern, but in essence, you’d see developers writing code like this:
myServer.on(‘request’, (connData) => {
// .... handle a generic request from any client
myServer.on(‘new-message’, (newMessage) => {
//... handle messages received
})
})
The intention of the code is to set up a message handler for users that send a new request to the system. The problem? You’re re-setting the handler every time a new request is received, this, in turn, is not overwriting the previous handler but rather, adding a new one to it. And now you see the problem, once a ‘new-message’ event is triggered, a lot of handlers will be fired. In essence, calling that particular callback many times.
This version of the problem is arguably harder to solve because you need to understand how the on method works before you can find a workaround for it.
As you’ve probably guessed, the solution for this problem is to take the code for the ‘new-message’ event out, like this:
myServer.on(‘request’, (connData) => {
// .... handle a generic request from any client
})
myServer.on(‘new-message’, (newMessage) => {
//... handle messages received
})
There are many different ways to set up handlers, depending on the API the library you’re using provides you with. So make sure you understand how they work before trying to write any kind of complex logic with it.
3. Ignoring asynchronous behavior
Anyone coming into Node.js from any other language other than JavaScript (i.e front-end developers) will have some issues adjusting to the asynchronous nature of the language. It’s not super complicated but rather a bit counterintuitive until you get used to it.
Essentially, most people would assume reading a file, for example, would be done in a similar manner to this:
let content = fs.readFile(“./your-file.txt”)
console.log(content)
However, if you’ve worked with Node for a bit, you’d know that’s completely wrong. This is another common way to write it, which is also wrong.
let filecontent = “”;
fs.readFile(“./your-file.txt”, (err, content) => {
filecontent = content;
})
console.log(filecontent);
Again, right intentions, wrong execution: you’re considering the callback, as you should, but you’re not taking into account the event-loop, order of execution and async nature of the code.
You need to remember, every time you set up an async handler (i.e a callback), that function will never be executed next in line, even if you used something like:
console.log(“Test 1”);
setImmediate(() => {
console.log(“Test 2”);
})
console.log(“Test 3”);
The output would be:
Test 1
Test 3
Test 2
If the callback-based syntax is not ideal to you (for whatever reason), there are alternatives that still deal correctly with the async nature of the language, such as using promises, or new (ish) async/await constructs. They might help you ease into async behavior without having to mentally parse the code (especially async/await, you should check it out).
4. Mixing callbacks with Throw-catch statements
Related to the previous mistake, Node.js (or in this case, JavaScript in general) can give the wrong message to the developer by providing the Throw-Catch API while at the same time, imposing the need for callbacks.
Those coming from languages such as JAVA or C++ might especially be prone to this particular mistake, since using Throw-Catch blocks is encouraged in those scenarios.
The problem here stems from expecting errors to bubble up from within the callbacks where they are thrown from and still be caught by the TRY block used outside of them.
Here, let me show you:
const fs = require("fs")
try {
fs.readFile("./your-file.txt", _ => {
throw(new Error("Testing error messages here..."))
})
} catch (e) {
console.log("--------------------------------------------------------")
console.log("Error caught: ", e)
console.log("--------------------------------------------------------")
}
If you ignore everything I’ve said so far about asynchronous behavior and handling rules for callback functions, you would probably expect the output of the above code to output the three lines outlining the error message. However, if you execute the code, you’ll see something like this:
throw(new Error("Testing error messages here..."))
^
Error: Testing error messages here...
at fs.readFile._ (/home/fernando.doglio/workspace/personal/writing/buttercms/nodejs-mistakes/throw.js:5:8)
at FSReqWrap.readFileAfterClose [as oncomplete] (internal/fs/read_file_context.js:53:3)
You’re seeing the error, but that’s just because Node.js didn’t know how to handle the uncaught exception, so it errored out. The CATCH block is not being called, and here is why:
Inside the TRY block, you’re setting up the callback for when the file is read but when that function is triggered, the scope associated to that function does not include the try-catch block and that code is no longer part of the context. Thus, the thrown exception is lost.
There is actually an error handling pattern proposed by Node.js itself, which is based on callbacks (I know, that word keeps coming back, you’ll need to embrace them eventually).
Essentially, you’ll notice how all of Node’s methods that receive a callback, pass any kind of errors as the first argument when calling them.
This is known as “error-first callbacks”, and it’s an idiomatic pattern defined by the language. A way to fix the above code would be to extract the error handling code, and the logic you have for handling the actual content of the file into an external function, which would receive two parameters: the first one being the error and the second one being the actual content:
function cntHandler(err, cnt) {
if(err) {
console.log("------------------------------------------------")
console.log("Error caught: ", err)
console.log("------------------------------------------------")
}
//here is your content handling code
}
fs.readFile("./your-file.txt", (err, cnt) => {
if(err) return cntHandler(new Error("Testing error messages here..."))
cntHandler(null, cnt)
})
And if you wanted to take it one step further, you could simplify the readFile call into something like this and it would still work because you’re following the same error-first pattern as Node.js is following:
fs.readFile("./your-file.txt", cntHandler)
If you stick to callbacks, please make sure you follow this pattern, since it’s what the language itself uses, you’ll see how it tends to simplify your code a lot.
5. Ignoring CORS while working on your API
When developing your APIs, you’ll normally test them from your own browser, especially if you already have the client application ready (or in development). In these situations, you’re “publishing” your service locally, and testing them from the same computer, so there is no problem. However, browsers tend to apply a layer of security over interactions they have with external services (and the word “external” here being key).
Because you’re testing locally, you might be aiming your API calls to URLs such as http://localhost:3000 or http://127.0.0.1:8080 (whatever the port is, the host is what’s important for this problem) and at the same time, you might be publishing your client app locally (i.e published in http://localhost ). This scenario is perfectly safe from the eyes of a browser, since the hostname seems to be the same, so there is no reason to not trust these calls.
However, if you were to set up a different domain name in your hosts file , essentially fooling your browser into thinking it is making an HTTP call from one domain to the other (even though you’re calling the same localhost domain in reality), you’ll see your calls will start getting canceled. Yes, canceled by the browser, they’re not even reaching your API, and this is because the way the browser works is:
- Send an OPTIONS request to the URL, sending the Origin header, specifying the domain name and getting back only headers.
- If this is a different host from where the request originates, it’d look for a specific header called Access-Control-Allow-Origin.
- If the header’s content does not match anything inside the value of that header, the browser will not allow for the request to go through.
It is worth noting, the above behavior is meant for “preflighted” requests, which are some requests that seem “out of the ordinary”, simpler requests do not fall under that behavior (i.e requests with no custom headers or out-of-the-ordinary content types).
This behavior is what falls under the name of CORS (or Cross-Origin Resource Sharing) and it’s basically a protocol set to allow for communication between a client and a server in two different domains.
If you’re building a public API, a good rule of thumb is to make sure anyone can use it, and that is achieved by replying to all requests with the Access-Control-Allow-Origin header. It should have a “*” as its value, meaning any domain is capable of interacting with the service.
However, if on the other hand, you do want to make sure only specific requests can reach you, filtering them using these headers is the way to go.
As for solving this problem, there are many ways of doing it, but the end-result needs to be the same: the header needs to be sent back.
As for code, every library has different ways of doing it, so there are plenty of examples to choose from. For this particular case, I’ll be showcasing the following example:
var express = require('express')
var cors = require('cors')
var app = express()
app.use(cors())
app.get('/products/:id', function (req, res, next) {
res.json({msg: 'This is CORS-enabled for all origins!'})
}
app.listen(80, function () {
console.log('CORS-enabled web server listening on port 80')
})
By default, this module is allowing all requests to come through. However, if you wanted to filter some of them out:
var express = require('express')
var cors = require('cors')
var app = express()
var corsOptions = {
origin: 'http://your-domaini.com',
optionsSuccessStatus: 200 // some legacy browsers (IE11, various SmartTVs) choke on 204
}
app.get('/products/:id', cors(corsOptions), function (req, res, next) {
res.json({msg: 'This is CORS-enabled for only example.com.'})
})
app.listen(80, function () {
console.log('CORS-enabled web server listening on port 80')
})
Conclusion
Blocking the event loop, callback confusion, ignoring asynchronous behavior, mixing callbacks and ignoring CORSare the most common mistakes I’ve seen developers make when working (or rather, starting to work) with Node.js. And trust me, we’ve all been there, if you’re doing one or more, don’t feel bad, it’s just part of the job, just like running a DELETE statement without a WHERE clause or deleting that file from the production server rather than from your dev environment.
With that being said, the key takeaway from this article shouldn’t be that you’re doing something wrong, it should be that you’re trying to improve. So, please, as a last and final step in your learning process, feel free to share in the comments sections your own experience with these types of mistakes. What mistakes did you run into?? Did I leave any big mistakes out? Let me know!
Otherwise, see you on the next one!
ButterCMS is the #1 rated Headless CMS
Related articles
Don’t miss a single post
Get our latest articles, stay updated!
Fernando Doglio has been working as a Web Developer for the past 10 years. In that time, he's come to love the web, and has had the opportunity of working with most of the leading technologies at the time, suchs as PHP, Ruby on Rails, MySQL, Node.js, Angular.js, AJAX, REST APIs and others. He can be contacted on twitter at: @deleteman123 or you can read more about him and his work at www.fdoglio.com