Sunday, August 17, 2014

Adding hooks to gclient deps

In the previous several posts, we got to the point of being able to sync a git repo along with some dependencies. For reference, here's the .gclient file we created so far as well as the .DEPS.git from my_project directory:

$ cat .gclient
solutions = [
  { "name"        : "my_project",
    "url"         : "ssh://example.com/repos/my_project.git",
    "deps_file"   : ".DEPS.git",
    "managed"     : True,
    "custom_deps" : {
    },
    "safesync_url": "",
  },
]
cache_dir = None

$ cat my_project/.DEPS.git
vars = {
  # Common settings.
  "base_url" : "ssh://example.com/repos",
  "project_directory" : "my_project",

  # Specify dependency package |package| as package_destination, package_url,
  # and package_revision tuples. Then, ensure to add the dependency in deps
  # using the variables.

  # Google test
  "googletest_destination" : "third_party/googletest",
  "googletest_url" : "/external/googletest.git",
  "googletest_revision" : "2a2740e0ce24acaae88fb1c7b1edf5a2289d3b1c",
}

deps = {
  # Google test
  Var("project_directory") + "/" + Var("googletest_destination") :
      Var("base_url") + Var("googletest_url") + "@" + Var("googletest_revision")
}

Now, in the last post, I noted that the initial checkout seems to happen in a detached state. That is, git status prints out something along the lines of the following:

HEAD detached at origin/master

I would like my initial checkouts to always go to the master branch, since I know I will forget to switch to the master branch when doing changes. This seems like a good application of a hook.

As you recall, .DEPS.git from Chromium had a section called hooks. Great! I would imagine that's exactly what we want.

The structure seems to be as follows:

hooks = [
  {
    "name" : "hook_name",
    "pattern" : "hook_pattern",
    "action" : ["hook_action", "hook_action_parameter1"]
  }
]

Name and action are fairly self explanatory, but I'm not too clear about what the pattern is supposed to represent. Run this hook only if something matches this pattern? That would make sense. However, most of the hooks in Chromium .DEPS.git have pattern as ".", so let's stick with that for now.

Let's just jump in and add a simple hook into our .DEPS.git:

...
hooks = [
  {
    "name" : "hello",
    "pattern" : ".",
    "action" : ["echo", "hello world!"]
  }
]

This should do something:

$ gclient runhooks

________ running 'echo hello world!' in '/tmp/learning_gclient'
hello world!
$ gclient sync
Syncing projects: 100% (2/2), done.                             

________ running 'echo hello world!' in '/tmp/learning_gclient'
hello world!

Perfect! We can run hooks separately, and the hooks are run when we sync. That's exactly what we want. One observation I have is that it's probably a good idea to stick to some sort of a cross platform scripting language when writing hooks, since 'echo' on my machine might not exist on some other machine. Since gclient itself is written in python, it's a good bet that python is installed. As such, let's stick with python as the hooks language.

Also note that we're running this hook in the same directory as the .gclient file (/tmp/learning_gclient in my case). Switching into my_project and running hooks again confirms that we're always running it from the same directory as the .gclient file.

Alright, let's just jump in and write a hook that checkout out master if the current state is "HEAD detached at origin/master". Learning python is out of the scope of this post, but after some googling on how to do things in it I came up with this:

$ cat my_project/hooks/checkout_master.py
import os
from subprocess import Popen, PIPE

def main():
  os.chdir("my_project")
  p = Popen(['git', 'status'], stdout=PIPE)
  output, err = p.communicate()

  line = output.splitlines()[0].strip()
  if line != 'HEAD detached at origin/master':
    print 'not an initial checkout, skip checkout master'
    return

  print 'checking out master branch'
  p = Popen(['git', 'checkout', 'master'], stdout=PIPE, stderr=PIPE)
  output, err = p.communicate()
  
if __name__ == "__main__":
  main()

Basically, we switch to my_project (I don't like hardcoding the project name here, but it will do for now), get the output of "git status", and if the first line of that is 'HEAD detached at origin/master', we run "git checkout master". Now, let's add this hook into our .DEPS.git, replacing the hello world one:

...
hooks = [ 
  {
    "name" : "checkout_master",
    "pattern" : ".",
    "action" : ["python", Var("project_directory") + "/hooks/checkout_master.py"]
  }
]

Let's see if that works as expected:

$ gclient runhooks

________ running '/usr/bin/python my_project/hooks/checkout_master.py' in '/tmp/learning_gclient'
not an initial checkout, skip checkout master

Right, that's because during my testing, I already switched to the master branch. Let's just delete the whole project and sync again. However, remember to commit/push your changes. During my first attempt, I removed the directory and lost all of my changes (ie, I had to write the script and the hooks again).

$ rm -rf my_project
$ gclient sync
Syncing projects: 100% (2/2), done.                             

________ running '/usr/bin/python my_project/hooks/checkout_master.py' in '/tmp/learning_gclient'
checking out master branch
$ cd my_project/
$ git status
On branch master
Your branch is up-to-date with 'origin/master'.

nothing to commit, working directory clean

Excellent! Everything seems to be working as intended. I think I'm happy enough with this basic gclient setup to move on to the actual build system using either GYP or GN.

- vmpstr

P.S. To eliminate the project name from checkout_master.py (thus making it a generic hook for any project), we should just move the project name to be a parameter. Here's a diff that makes it so:

diff --git a/.DEPS.git b/.DEPS.git
index 482ee29..56555cd 100644
--- a/.DEPS.git
+++ b/.DEPS.git
@@ -23,6 +23,10 @@ hooks = [
   {
     "name" : "checkout_master",
     "pattern" : ".",
-    "action" : ["python", Var("project_directory") + "/hooks/checkout_master.py"]
+    "action" : [
+      "python",
+      Var("project_directory") + "/hooks/checkout_master.py",
+      Var("project_directory")
+    ]
   }
 ]
diff --git a/hooks/checkout_master.py b/hooks/checkout_master.py
index f41eefa..b5cf30a 100644
--- a/hooks/checkout_master.py
+++ b/hooks/checkout_master.py
@@ -1,8 +1,10 @@
 import os
 from subprocess import Popen, PIPE
+import sys
 
 def main():
-  os.chdir("my_project")
+  if len(sys.argv) == 2:
+    os.chdir(sys.argv[1])
   p = Popen(['git', 'status'], stdout=PIPE)
   output, err = p.communicate()

Saturday, August 16, 2014

Adding gclient deps

Up until now, we have basically simulated a git clone using gclient. We can now start with an empty directory (and a .gclient file), do a

$ gclient sync

and end up with a clone of whatever solution (project) was specified in the .gclient file.

The next thing I'd like to is to add some dependencies. As you recall, we told gclient config to use git dependencies, which means we need to come up with a .DEPS.git file that specifies the dependencies of the project. I believe that you can use any filename you want, as long as .gclient file reflects that, but don't quote me on that.

So far, here's what I have:

$ ls -1a
.
..
.gclient

$ cat .gclient
solutions = [
  { "name"        : "my_project",
    "url"         : "ssh://example.com/repos/my_project.git",
    "deps_file"   : ".DEPS.git",
    "managed"     : True,
    "custom_deps" : {
    },
    "safesync_url": "",
  },
]
cache_dir = None

$ gclient sync
Syncing projects: 100% (1/1), done.

$ ls -a
.
..
.gclient
.gclient_entries
my_project

There's a new file .gclient_entries, and my project clone (the same thing I would get if I did a git clone ssh://example.com/repos/my_project.git)

For completeness,

$ cat .gclient_entries
entries = {
  'my_project': 'ssh://example.com/repos/my_project.git',
}

That seems to be a simple map of project name (possibly directory) to the git url of the project. I'll just leave that alone.

Now, I'd like to add a dependency to this project. I will probably have a lot of dependencies, but let's just add googletest.

It has its own repository and uses subversion as far as I can tell. Since I prefer to have all of my code and dependencies on one host, I actually went ahead and cloned googletest, then created a git repo on my host (what I refer to as example.com here), and committed it there. So, googletest is available for me form ssh://example.com/repos/external/googletest.git. Note that in the future, it might be worth it to experiment with adding dependency straight from the source, but for now I want to be able to keep track of everything.

Now, to add a .DEPS.git to the main project. Ha! I have no idea what it supposed to look like. Since I know Chromium has a bunch of dependencies, that's probably a good spot to start digging. Conveniently, Chromium has a very useful code search tool that allows us to take a look at the full source code without needing to clone a local copy: cs.chromium.org. Searching for .DEPS.git gets me pretty quickly to this file.

It looks like it has the following structure:

vars = {
 ...
}

deps = {
 ...
}

deps_os = {
 ...
}

include_rules = {
 ...
}

skip_child_includes = {
 ...
}

hooks = {
 ...
}

It also supports comments like most good files. Now, since gclient didn't mind when I didn't have this file, I'm hoping that it doesn't mind if I omit some of the sections. In particular, since I'm only trying to get one dependency to clone, I'm not going to put in anything after deps (ie deps_os, include_rules, etc).

Also, since I want this to be maintainable, I'm going to define convenient vars and use them in deps. Here's my first attempt:

vars = {
  # Common settings.
  "base_url" : "ssh://example.com/repos",

  # Specify dependency package |package| as package_destination,
  # package_url, and package_revision tuples. Then, ensure to
  # add the dependency in deps using the variables.

  # Google test
  "googletest_destination" : "third_party/googletest",
  "googletest_url" : "/external/googletest.git",
  "googletest_revision" : "2a2740e0ce24acaae88fb1c7b1edf5a2289d3b1c",
}

deps = {
  # Google test
  Var("googletest_destination") :
      Var("base_url") + Var("googletest_url") + "@" + Var("googletest_revision")
}

Most of the fields are self explanatory. Googletest_revision refers to the git hash of the latest (and in my case only) git checking. You can get this via git log if you clone the repo separately.

Let's see what sync gets us:

$ gclient sync
Syncing projects: 100% (2/2), done.

That seems to have worked, 2/2 is a good thing, but...

$ ls my_project/
README

my_project only has README that I added to it independently. Let's see what happened. Asking gclient to print more information gets us the following:

$ gclient sync --verbose
solutions = [
  { "name"        : "my_project",
    "url"         : "ssh://example.com/repos/my_project.git",
    "deps_file"   : ".DEPS.git",
    "managed"     : True,
    "custom_deps" : {
    },
    "safesync_url": "",
  },
]
cache_dir = None


my_project (Elapsed: 0:00:01)
----------------------------------------
[0:00:00] Started.
_____ my_project at refs/remotes/origin/master
[0:00:01] Fetching origin
Checked out revision ec01ec9b2387175083549cb155d5aa00a6311ed0
[0:00:01] Finished.
----------------------------------------

third_party/googletest (Elapsed: 0:00:00)
----------------------------------------
[0:00:01] Started.
_____ third_party/googletest at 2a2740e0ce24acaae88fb1c7b1edf5a2289d3b1c
[0:00:01] Up-to-date; skipping checkout.
Checked out revision 2a2740e0ce24acaae88fb1c7b1edf5a2289d3b1c
[0:00:01] Finished.
----------------------------------------

Hmm everything up to date. Ah! The problem is that it seems to have checked out googletest relative to the .gclient file, not relative to the project:

$ ls -1
my_project
third_party

I guess that's useful in some scenarios, but I would really prefer to keep my third_party libs inside my_project directory. That's easy enough to fix. Here's an updated .DEPS.git:

vars = {
  # Common settings.
  "base_url" : "ssh://example.com/repos",
  "project_directory" : "my_project",

  # Specify dependency package |package| as package_destination,
  # package_url, and package_revision tuples. Then, ensure to
  # add the dependency in deps using the variables.

  # Google test
  "googletest_destination" : "third_party/googletest",
  "googletest_url" : "/external/googletest.git",
  "googletest_revision" : "2a2740e0ce24acaae88fb1c7b1edf5a2289d3b1c",
}

deps = {
  # Google test
  Var("project_directory") + "/" + Var("googletest_destination") :
      Var("base_url") + Var("googletest_url") + "@" + Var("googletest_revision")
}

I added a project_directory variable and modified deps to use that as the leading directory before googletest_destination. Let's see if that does it:

$ gclient sync
Syncing projects: 100% (2/2), done.                             

WARNING: 'third_party/googletest' is no longer part of this client.  It is recommended that you manually remove it.

$ ls -1 my_project
README
third_party

That seems to have worked with a useful message reminding me that third_party/googletest (not my_project/third_party/googletest) was removed, so I should clean that up.

Running gclient sync again does not produce the message again. This means that gclient records what dependencies I had on the last run, and warns me if they are no longer in the new .DEPS.git (protip: it's in .gclient_entries). Got it. I'll remove third_party/googletest.

Another cool thing is that I can run gclient sync from any subdirectory and it seems to find the .gclient file. That's kind of useful.

Ok, as a last task let's do a bit of cleanup. From within my_project, I get the following

$ git status
HEAD detached at origin/master
Untracked files:
  (use "git add ..." to include in what will be committed)

 .DEPS.git
 third_party/

nothing added to commit but untracked files present (use "git add" to track)

I'll keep .DEPS.git, so I'll add it to the repo, but I don't want to add third_party or be constantly reminded about it, so I'll put third_party into .gitignore and add that instead.

$ git add .DEPS.git
$ echo third_party > .gitignore
$ git add .gitignore
$ git status
HEAD detached at origin/master
Changes to be committed:
  (use "git reset HEAD ..." to unstage)

 new file:   .DEPS.git
 new file:   .gitignore

Better. "HEAD detached at origin/master" is kind of worrying me though.

$ git commit -a
[detached HEAD 4e6f8b3] Added deps and gitignore
 2 files changed, 22 insertions(+)
 create mode 100644 .DEPS.git
 create mode 100644 .gitignore
$ git push
fatal: You are not currently on a branch.
To push the history leading to the current (detached HEAD)
state now, use

    git push origin HEAD:<name-of-remote-branch>

That sucks, I want to be on master when I sync initially. For now,

$ git push origin HEAD:master
Counting objects: 5, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (4/4), 661 bytes | 0 bytes/s, done.
Total 4 (delta 0), reused 0 (delta 0)
To ssh://example.com/repos/my_project.git
   ec01ec9..4e6f8b3  HEAD -> master
$ git status
HEAD detached from ec01ec9
nothing to commit, working directory clean
$ gclient sync
Syncing projects: 100% (2/2), done.                             
$ git status
HEAD detached from ec01ec9
nothing to commit, working directory clean
$ git checkout master
Previous HEAD position was 4e6f8b3... Added deps and gitignore
Switched to branch 'master'
Your branch is behind 'origin/master' by 1 commit, and can be fast-forwarded.
  (use "git pull" to update your local branch)
$ gclient sync
Syncing projects: 100% (2/2), done.                             
$ git status
On branch master
Your branch is up-to-date with 'origin/master'.

nothing to commit, working directory clean

Ok, everything seems to be working. I'm hoping that I can add some sort of a hook to checkout master when syncing, as I don't want to always be remembering to checkout master. But that's for the next post.

- vmpstr

Getting basic gclient checkout setup.

In order to get our gclient workflow/build system going we obviously need to get gclient. So how do we do this?

Googling for gclient takes us to http://code.google.com/p/gclient/, which says that gclient is now a part of depot_tools with a link to depot_tools (http://dev.chromium.org/developers/how-tos/depottools) and an svn checkout command to get the latest set of tools:

$ svn checkout http://src.chromium.org/svn/trunk/tools/depot_tools

So, let's do just that. We need to check out the tools to somewhere accessible by the user, since this will be a set of tools we will consistently use for all the projects we might create. I put it under my home directory. You might have a better organizational structure; whatever works.

As an aside, it's somewhat discouraging to see depot_tools website talks primarily about Chromium source code. However, I think we can make it work with any project.

Now that I have ~/depot_tools checked out, I can see that it has quite a few tools, including gclient.

Next, we need to add this to our path, so that we can execute the commands without specifying the full path. That's easy enough:

$ export PATH=~/depot_tools:$PATH

I put that in my .bashrc so that it's always set whenever I start a terminal. Note that on my macbook, I also had to edit .bash_profile and put

source ~/.bashrc

in it, since that's what is actually run when I enter a terminal, it enters a login shell.

Moving on. I can now run gclient, so let's see what that gives me:

$ gclient
Usage: gclient.py  [options]

Meta checkout manager supporting both Subversion and GIT.

Commands are:
  cleanup  cleans up all working copies
  config   creates a .gclient file in the current directory
  diff     displays local diff for every dependencies
  fetch    fetches upstream commits for all modules
  grep     greps through git repos managed by gclient
  help     prints list of commands or help for a specific command
  hookinfo outputs the hooks that would be run by `gclient runhooks`
  pack     generates a patch which can be applied at the root of the tree
  recurse  operates [command args ...] on all the dependencies
  revert   reverts all modifications in every dependencies
  revinfo  outputs revision info mapping for the client and its dependencies
  runhooks runs hooks for files that have been modified in the local working copy
  status   shows modification status for every dependencies
  sync     checkout/update all modules

Options:
  --version             show program's version number and exit
  -h, --help            show this help message and exit
  -j JOBS, --jobs=JOBS  Specify how many SCM commands can run in parallel;
                        defaults to 8 on this machine
  -v, --verbose         Produces additional output for diagnostics. Can be
                        used up to three times for more logging info.
  --gclientfile=CONFIG_FILENAME
                        Specify an alternate .gclient file
  --spec=SPEC           create a gclient file containing the provided string.
                        Due to Cygwin/Python brokenness, it can't contain any
                        newlines.
  --no-nag-max          Ignored for backwards compatibility.

Ok. I think the first thing I need to do is to run config, since I know I need a .gclient file. I tried to follow instructions on http://dev.chromium.org/developers/how-tos/depottools but it seems to have a few broken links when it comes to .gclient examples. So, let's just try putting a repository on the line and run

$ gclient config ssh://example.com/repos/my_project.git

Note that this doesn't seem to do anything more than creating a .gclient file. In particular it doesn't try accessing that URL. I know this, since my first attempt had a typo and pointed to a non-existent repo.

$ cat .gclient
solutions = [
  { "name"        : "my_project",
    "url"         : "ssh://example.com/repos/my_project.git",
    "deps_file"   : "DEPS",
    "managed"     : True,
    "custom_deps" : {
    },
    "safesync_url": "",
  },
]
cache_dir = None

Ok, let's see if we can make sense of this. Solutions seems to refer to what I would call a project. It's an array of dictionaries (or something like that). Basically, each solution would refer to a full checkout of some particular project. Url is pretty self explanatory: it's the place where I can checkout my project. Deps refers to dependencies of this project. More precisely, it refers to a file in the checkout that lists dependencies. Managed, custom_deps, safesync_url, and cache_dir probably have their uses as well, but I'll leave that for a later post. Note that gclient config also has a --git-deps flag with an explanation that it will generate a .DEPS.git instead of DEPS as the deps file. Honestly, I'm not sure what the difference is, but since I'm going to be using git, I want to rerun gclient config with --git-deps:

$ gclient config --git-deps ssh://example.com/repos/my_project.git
$ cat .gclient
solutions = [
  { "name"        : "my_project",
    "url"         : "ssh://example.com/repos/my_project.git",
    "deps_file"   : ".DEPS.git",
    "managed"     : True,
    "custom_deps" : {
    },
    "safesync_url": "",
  },
]
cache_dir = None

Hmm, so that just changed DEPS to .DEPS.git. I don't know what I expected. Let's just work with .DEPS.git. One other thing of note is that rerunning config wiped my previous .gclient file and created a new one. This is good to know if I want to add more projects in the future. I would have to edit the .gclient file manually, instead of relying on gclient config. I didn't see any flag jump off the page that would enable me to append to a .gclient file instead of overwriting it.

Alright. It's the moment of truth. Let's see if this works as expected.

$ gclient sync
Syncing projects: 100% (1/1), done.
$ ls -d my_project
my_project

Success! Note that my_project doesn't have .DEPS.git file, but gclient didn't mind. Next time, I'm going to try and add some dependencies.

Was this easier than using git clone? No, of course not. However, I have a sneaking suspicion that figuring out how to create dependencies in git, automatically running scripts upon checkout, and other things might be easier to do with gclient. The rule of thumb I like to follow is that if it's hard to begin using something, then eventually hard tasks will be easier to do using it. That's one of the reasons I use VIM. :)

- vmpstr

Friday, August 15, 2014

Setting up a new project with a new build system.

Hi.

This is my first post in what I hope to be a series of tutorials on setting up a build system for a new project. Note that I am writing this with basic knowledge of how Makefiles work, some experience using Chromium build system, and not much more. As such, I'm not claiming that these series will be a correct comprehensive guide to how to do things. This will just be a log of what I have personally done and what worked for me.

This post is simply an introduction to what I have and what I want to accomplish.

Currently, I'm a Chromium developer, so I use the Chromium build system. This means depot_tools (gclient in particular), ninja instead of make, and gyp/gn files as files that specify targets and source files. From what I have seen, these tools are fairly powerful. The files they generate are readable, the code compiles well :)

Two things I like about it in particular:

  1. Separate binaries for unittests. These are specified as separate targets and produce separate executables that, when run, execute specified tests. Chromium uses Google Test libraries, which is what I plan to use as well.
  2. Ability to compile for different platforms. The magical thing here is that the same build system is used for all operating systems (expect for maybe iOS). This means that as a contributor, I don't need to worry about hacking some makefile in multiple places to ensure that everything works for all platforms. I think the magic is in the gyp or gn files.
Those two things among others is the reason I decided to convert my toy projects from Makefiles to gclient/gn build system (By the way, if there's an official name for the build system that I can use, please let me know).

As a developer, what I do best is programming. For me, this means writing C++ code, ensuring it compiles and links into an executable, and produces results that I want. However, one thing always bugged me about starting new projects: setting up the build system. I am currently working on some toy projects with no other goal but learning more of C++11 and I'm facing the same challenge again.

I have my base project checked in to git. It has some external library dependencies, such as googletest, zlib, libpng, etc. I currently just have those checked in to the code as well, under "third_party" directory. Additionally, I have a bunch of Makefiles that build the main project, as well as static libraries from third_party. The makefiles also link things correctly and are even able to run some tests. However, it's all a big unmaintainable mess.

In particular, every time I have to edit a Makefile, I immediately get a headache, struggle for half an hour to remember why I wrote the Makefile the way I did, then hack something together, test it, inevitably break some other target, and commit. Later, I go back and fix whatever I broke hoping I don't break more things.

So, I decided to document my exploration into something better. I want to be able to explain to a person looking at my code and build system for the first time what is going on. Moreover, I want to understand every line of what I write in my build system and the reason for me writing it. I don't want to do any more hacks.

As a disclaimer, I fully recognize that there are people out there whose Make-foo is awesome and they can do magic. I'm not one of these people.

Here are some notations that I will use:

  • I will use example.com as a host where my git repos are hosted.
  • I will try my best to document exactly what I type, even if it's incorrect (of course, I will explain the thing that worked later)
  • I will try to come up with more bullet points if I decide to use bullet points.
  • Apparently, I will also have to learn HTML in order to use this blogger properly.

- vmpstr