Let's Build a Thing: NPM Dependency Checker - Part 3

Picture shows Jason A. Martin - Software Engineer, Indie Game Developer, Tech Evangelist, Entrepreneur.

note: Series start along with dev notes: Let's Build: NPM Dependency Checker

Previous part: Let's Build: NPM Dependency Checker - Part 2

Core Coding

GitHub: NPM Dependency Checker branch: get-package-info note: Unless noted, the coding for this part is taking place in the ndc.ex file in /lib/

So our project is setup and we are ready to rock n roll. Let's start thinking about the small functions that will be needed to accomplish our required task for this application.

Project Dependencies

We're going to start using a couple new libraries in this part, so we need to update mix.exs so it knows about them. We also need to grab those dependencies.

Open mix.exs and make sure your deps section looks like this:

defp deps do
    [
      {:poison, "~> 2.0"},
      {:httpoison, "~> 0.9.0"}
    ]
end

Staying in mix.exs, make sure your application section looks like this:

def application do
    [applications: [:logger, :httpoison, :poison]]
end

We'll go over why we're adding these later in this article.

Getting the Repo's URL

Our first hurdle is repo information. How are we going to get it and what do we need?

Before we can do anything we need to figure out the GitHub URL for a given repo. There are a couple of ways to do this:

  1. We could visit the NPMJS.org page for the repo.
  2. We could use "npm view " on the command line, which will give us back a large blob of text about the repo.

Neither option is super clean, but the second is best. If we were to choose the first option, we'd need to fetch a webpage and then hunt through it looking for a node that contained a link to the repo and then get that link. Using a combo such as HTTPotion and Floki is easy enough, but the problem comes with the lack of great identifiers in the NPMJS page code and the number of links on the page.

Instead, we're going to execute a system command and work through the output.

Let's create a simple function that accepts one argument, performs the system command we want and returns the output.

def npm_view(package) do
  System.cmd("npm", ["view", package])
end

When this function is executed, you'll get back a massive (and disgusting) output, like this:

{" \n{ name: 'jest',\n  description: 'Painless JavaScript Unit Testing.',\n  'dist-tags': \n   { latest: '15.1.1',\n     next: '12.1.0-alpha1',\n     test: '14.3.2-alpha.83c25417' },\n  versions: \n   [ '0.0.6',\n     '0.0.7',\n     '0.0.61',\n     '0.0.71',\n     '0.0.72',\n     '0.0.73',\n     '0.0.74',\n     '0.0.75',\n     '0.0.76',\n     '0.0.78',\n     '0.0.79',\n     '0.0.80',\n     '0.0.81',\n     '0.0.82',\n     '0.0.83',\n     '0.0.84',\n     '0.0.85',\n     '0.0.86',\n     '0.0.87',\n     '0.0.88',\n     '0.0.89',\n     '0.0.90',\n     '0.0.91',\n
... and one for some time ...
homepage: 'https://github.com/facebook/jest#readme',\n  license: 'BSD-3-Clause',\n  version: '15.1.1',\n  main: 'build/jest.js',\n  dependencies: { 'jest-cli': '^15.1.1' },\n  bin: { jest: './bin/jest.js' },\n  engines: { node: '>= 4' },\n  scripts: {},\n  dist: \n   { shasum: 'd02972b3ba27067b7713e44219b4731aa48540a6',\n     tarball: 'https://registry.npmjs.org/jest/-/jest-15.1.1.tgz' },\n  directories: {} }\n\n"

If you want to see a pretty version, just run this command in your terminal:

npm view jest

Ok, so that's complete and working. What we're after if that homepage link. Let's pipe this output into a function that cleans it up a bit.

def get_package_repo_url(data) do
    ## sending output from npm view on the command line.
    Regex.run(~r/homepage: \'(.*)\'/, elem(data,0)) |> Enum.at(1)
end

Given our original data, the output from this point is something like this:

["homepage: 'https://github.com/facebook/jest#readme',",
 ": 'https://github.com/facebook/jest#readme',"]

As you can see, it's a list with two elements and it's close to what we need. The get_package_repo_url function returns the second element.

We now have the repo's URL and can move forward.

Package Hunting

While we could use the URL we have at this point (https://github.com/facebook/jest#readme) and then find the package.json link and then click it and so forth, there's an easier way.

What we're really after now is the source code for the package.json file. In that file is our dependencies and devDependencies.

If you go a GitHub repo, click on a file to view its source and then click on the raw button to view only its source code and nothing else, you'll notice that the url is something like this:

https://raw.githubusercontent.com/facebook/jest/master/package.json

Our current url is: https://github.com/facebook/jest#readme

We need a function that will take in a url and replace some things. Let's do that now.

  1. We are going to dump "#readme" from the link if it's there as we don't need it.
  2. We are going to replace "github" with "raw.githubusercontent".
  3. We are going to add "/master/package.json" to the end of the string to complete the url.
def transform_repo_raw_json_url(repo) do
  ## we will do a little replace therapy to fetch the raw json.
  ## end result is: https://raw.githubusercontent.com/someuser/somerepo/master/package.json
  String.replace(repo, "#readme","")
    |> String.replace("github","raw.githubusercontent")
    |>  (fn x -> x <> "/master/package.json" end).()
end

In transform_repo_raw_json_url we pass in a repo, which is a URL string (like: "https://github.com/facebook/jest#readme").

We then perform two replaces and one concatenation. The end result is a url like this: ** https://raw.githubusercontent.com/someuser/somerepo/master/package.json**

We're making great progress! From here, we can fetch that page and have the data we need.

There's a main problem though. If we fetch the page, it will not be JSON. We're going to fix that problem quickly.

Fetching JSON

At this point we have the url for the package.json file we want to consume. Our goal is to read the dependencies so that we can do stuff with them.

This function will be reworked in an upcoming segment. For now, we want to get it working.

We are going to use HTTPoison to do our fetching. As a side note, there is also HTTPotion.

Rather than throw up a .get into an existing function, I'm moving the call into its own function so that I can handle exceptions and so that the code is easier to maintain, read, etc.

In the falling function we will pass in repo, which is the package.json url. If the url is valid, the function will send the body of that response to another function (explained in a moment).

def fetch_package_json(repo) do
  case HTTPoison.get(repo) do
    {:ok, %HTTPoison.Response{status_code: 200, body: body}} ->
      decode_body(body)
    {:ok, %HTTPoison.Response{status_code: 404}} ->
      IO.puts "This is not the repo you are looking for."
    {:error, %HTTPoison.Error{reason: reason}} ->
      IO.inspect reason
  end
end

If all goes well, the contents of the package.json file for the repo is now being sent to decode_body.

Decoding JSON

The body of the response we got back isn't JSON. It's just a nice messy string. Take a look:

"{\n  \"private\": true,\n  \"devDependencies\": {\n    \"babel-core\": \"^6.14.0\",\n    \"babel-eslint\": \"^6.1.2\",\n    \"babel-plugin-syntax-trailing-function-commas\": \"^6.13.0\",\n    \"babel-plugin-transform-es2015-destructuring\": \"^6.9.0\",\n    \"babel-plugin-transform-es2015-parameters\": \"^6.11.4\",\n    \"babel-plugin-transform-flow-strip-types\": \"^6.14.0\",\n    \"chalk\": \"^1.1.3\",\n    \"codecov\": \"^1.0.1\",\n    \"eslint\": \"^3.4.0\",\n    \"eslint-plugin-babel\": \"^3.3.0\",\n    \"eslint-plugin-flow-vars\": \"^0.5.0\",\n    \"eslint-plugin-flowtype\": \"^2.16.1\",\n    \"eslint-plugin-react\": \"^6.2.0\",\n    \"flow-bin\": \"^0.31.1\",\n    \"glob\": \"^7.0.6\",\n    \"graceful-fs\": \"^4.1.6\",\n    \"istanbul-api\": \"^1.0.0-aplha.10\",\n    \"istanbul-lib-coverage\": \"^1.0.0\",\n    \"jasmine-reporters\": \"^2.2.0\",\n    ...snip ...
\"progress\":  jest-coverage -- -i && npm run test-examples && node scripts/mapCoverage.js && codecov\",\n    \"test-examples\": \"node scripts/test_examples.js\",\n    \"typecheck\": \"flow check\",\n    \"watch\":   \"testPathIgnorePatterns\": [\n      \"/node_modules/\",\n      \"/examples/\",\n      \"integration_tests/.*/__tests__\",\n      \"\\\\.snap$\",\n      \"packages/.*/build\"\n    ],\n    \"testRegex\": \".*-test\\\\.js\"\n  }\n}\n"

Fortunately, there's a library called Poison that we can use to decode JSON (and encode).

So let's build a function that accepts the response JSON body and build us a map we can use to finally get at these packages.

The first thing we need to do is create a module attribute that will hold the fields we are interested in for the decoding.

@expected_fields ~w(
   devDependencies dependencies
)

Now that we have our attribute in place we can write the decode_body function.

def decode_body(body) do
  body
  |> Poison.decode!
  |> Map.take(@expected_fields)
  |> Enum.map(fn({k, v}) -> {String.to_atom(k), v} end)
end

In our decode_body function we accept an argument and use Poison to decode it. We then take the @excepted_fields and basically create a nice map with the data we expect.

The end result is quite lovely. We get back a list of up to two items (dependencies and devDependencies) just like this:

[dependencies: %{"unique-random-array" => "1.0.0"},
 devDependencies: %{"babel" => "^6.1.18", "babel-cli" => "^6.2.0",
   "babel-core" => "^6.2.1", "babel-preset-es2015" => "^6.1.18",
   "chai" => "3.4.1", "coveralls" => "^2.11.4", "istanbul" => "^0.4.0",
   "mocha" => "2.3.4", "mocha-lcov-reporter" => "^1.0.0"}]

As you can see we have maps to work with. Now we can get a list of dependencies for this repo!

Parse Dependencies

We've got our list and we're closing in on a major milestone. Now we need to parse out the dependencies.

Let's build a function that accepts a dependencies_list and prints out the dependencies.

def parse_dependencies(dependencies_list) do
  Enum.at(dependencies_list, 0)
    |> (fn x -> iterate_dependencies(elem(x, 1)) end).()
end

The function parse_dependencies takes the dependencies_list and iterates over the first element in the list. Obviously this is a little bit of a problem since our dependencies_list could have multiple elements, but let's just stand this up for now and come back to it.

You'll notice that we're passing in elem(x, 1) to iterate_dependencies. That's because we have at that point is a tuple with an Atom (label of either dependencies or devDependencies) as the first element. The second element is a map.

Let's write the iterate_dependencies function. For now, we're just going to IO.inspect it into the terminal.

def iterate_dependencies(map) do
  Enum.map(map, fn {k, v} -> IO.inspect k end)
end

Whew, that's a lot of stuff we did. Let's wrap this part up and get ready for what's next.

Tidy Up

There was a lot of coding for this part. At this point you can build the application and use it. If you feel lost, check out the branch for this part and make sure your code matches up.

Here's what you can now do:

  1. Using terminal, navigate to the ndc folder and enter mix escript.build to build the application.
  2. In terminal, use it and see the result:
$ ./ndc --pkg=jest

## OUTPUT:
NPM package: jest
"babel-core"
"babel-eslint"
"babel-plugin-syntax-trailing-function-commas"
"babel-plugin-transform-es2015-destructuring"
"babel-plugin-transform-es2015-parameters"
"babel-plugin-transform-flow-strip-types"
"chalk"
"codecov"
"eslint"
"eslint-plugin-babel"
"eslint-plugin-flow-vars"
"eslint-plugin-flowtype"
"eslint-plugin-react"
"flow-bin"
"glob"
"graceful-fs"
"istanbul-api"
"istanbul-lib-coverage"
"jasmine-reporters"
"lerna"
"minimatch"
"mkdirp"
"progress"
"rimraf"

TODO

Here are a few things on the thought list:

  • I found an error (testing is working on some repos and not others). Oh noes! The fix is coming in the next part.
  • We're doing a lot of trusting with our functions. We need tests!
  • We need to refactor parse_dependencies to handle all the elements it's given.
  • We need to create a struct or something to hold dependency data.
  • We need to log the dependency name and version number. Maybe a count too?

Let's get going on the next part.

Let's Build: NPM Dependency Checker - Part 4