Node.js Streams in Action

Dan Stenger
4 min readJan 10, 2021

Ever wondered how streams are working in Node.js? I had. So far I know that using streams will most certainly reduce memory usage on the server when processing large files. Instead of reading whole file into memory we stream it in chunks to whoever requested it and apply transformations to that stream if needed. That’s huge benefit as it allows to avoid vertical scaling. Processing files is common task but not as common as interacting with databases and that’s going to be my focus here. I’ll build simple API with 2 endpoints. Both will be returning pretty large amount of records from Postgres DB. One endpoint will stream data to the client and another will read whole data into memory and return in one chunk.

If I work in Node.js environment, there’s usually Express.js framework in front of it with Sequelize orm. For this experiment I decided to try something different and choose Koa.js framework and Knex orm. Let’s start.

I’ll setup the project first:

Now time to add app dependencies. You can see all packages here. To setup my development environment I’ll use Docker and docker-compose. Let’s see how Dockerfile.dev looks like:

Pretty standard right? Now docker-compose.yml to add DB dependency and bootstrap whole thing:

Now that environment is pretty much ready, I’ll continue with application logic itself. Let’s look at app.js:

I first load .env vars, import Koa, my custom error handling middleware and router. The error handling middleware looks like this:

According to documentation, error handling middleware has to be added first in Koa app in order to capture all exceptions. Next stop is router:

Interesting thing to mention is that in app.js I use router.allowedMethods() middleware that handles unsupported methods with 405 Method Not Allowed for me. Lets look at the handlers next. nostream:

Pretty sure you’re wondering how that db looks like:

config is just simple object literal that exposes some env vars. You can see it here. I require knex, establish db connection and export instance of it. Since all node modules are singletons, no matter how many times I’ll require this package, initiation will be performed only once. Everything else in nostream handler is pretty much straightforward. Please refer to Koa or Knex documentation for more details. Next let’s look at stream handler:

Here I call .stream() on my query to get stream instance. I then pipe this query stream to client. In order for that to happen I have pipe function that returns a promise and we don’t exit the handler instantly but rather wait until streaming is done or error occurs. I added stringify package here because response (writeable stream) expects an input of a type string or an instance of Buffer and DB stream operates with Object types.

By the way, all ctx.throw will be captured by errorHandler middleware we created previously. Now before I jump into testing my server I need some data in DB. Since I have Knex already installed locally, I need a config file for it to be able to run migrations, seeds etc:

./node_modules/.bin/knex init

This will generate knexfile.js. I have slightly changed it to use my env vars. You can see it here. Now I’ll generate the migration:

./node_modules/.bin/knex migrate:make users

This will create a migration file in migrations/{timestapm}_users.js You can review it here. Before I run it, start the application:

docker-compose up --build

Now since app is up and running, time to run migration:

POSTGRES_HOST=localhost npm run migrate:up

I specify postgres host here since I run migration from host machine for DB that runs in docker environment under pg namespace and if I don’t specify it, pg name will be used from env vars which will end up with failed connection attempt.

Now what I need is some records in users table for my API to serve. Knex cli can ease this process. It has command to generate a seed file:

./node_modules/.bin/knex seed:make create_users

This will generate a seed file in seeds directory. You can see what I did with it here. If you’ll run into issues running it, try reducing amount of records you try to create. Currently it’ll try to create 60k records. Lets run it:

POSTGRES_HOST=localhost npm run db:seed

OK, with all that in place, I can now test both endpoints and see how well they perform. Let’s start with nostream endpoint:

And now stream endpoint:

So, streaming might be less memory intensive operation but it takes a bit longer to retrieve all data (60k users in this case). That certainly works when sending/processing large amounts of data and could be perfect fit for file processing or BI where we have to work with large data sets. When it comes to common tasks though, say admin panel that retrieves 10–100 records per page, most likely streaming wouldn’t be the best fit.

Not only because it takes a bit more time, but also due to code complexity as it requires more code to achieve same output. It all boils down to good old saying: “choose the right tool for the job”. I hope you have learned something useful. You can find the whole project here.

--

--

Dan Stenger

Software engineer focusing on simplicity and reliability. GO and functional programming enthusiast