libidf I.

Motivation

Wouldn’t it be nice too be see things like this just work?

<sysctl>
  <entry key="hw.acpi.battery.life" value="72" />
  <entry key="hw.acpi.battery.time" value="315" />
  ...
</sysctl>

or …

{
  "lo0": {
    "metric": 0,
    "mtu": 16384,
    "inet": [127, 0, 0, 1],
    ...
  }
}

Simpy put, machine-readable output of (non-)standard utilities. Just imagine every command line utility on your UNIX that displays some useful output that might be useful for some programmer out there in the nice form easily crunchable by the program. Possible examples are:

  • ifconfig
  • ctfdump
  • sysctl
  • find & ls
  • netstat, procstat
  • ping, traceroute
  • insert your tool here

If this sounds just one bit interesting and important, continue reading.

Problem

You see, I am not the only one to think of this, take this repository as an example. Don’t get me wrong, it is a nice piece of software. The problem with such approach is the scalability – if I make the JSON exporter for every one of the utilities mentioned above, I might as well quit my university and start working on that. Now let’s imagine there is another company that loves XML or maybe C data type notation to exchange data. For every format we add, the code of these utilities gets bulkier and the original purpose of the program will be hidden in hundreds lines of code emitting your favourite format. If we denote the number of formats F, we might say we have a 1:approach.

The same goes with adding new utility to your arsenal of machine-readable-output-creating programs. You will need to teach it to speak every format you fancy. Writing tenth YAML converter must be fun, not. If we denote number of utilities U, we might say we have a U:1 approach.

So, what do we want exactly? U:F.

P.S.: Do not get me started on the situation, when your favourite XML library is no longer supported or changed the license to an incompatible one and you have to rewrite every one of your tools.

Solution

We want to write a YAML/JSON/XML/… converter only once. And, maybe even more importantly, we want to infect the source of the utility with only one data-ventiling code. Where does that leave us? We need an intermediate data format. Intermediate Data Format Library to be precise, let’s call it libidf.

More about the design and implementation of this library in next posts. Stay tuned!

Advertisements
Posted in Uncategorized | Tagged , | Leave a comment

Idea: ctfcompress

Maybe there already exists an equivalent of tool, but I was not able to google it. I had no time to dig up the exact moment in the CTF toolchain process where the data gets compressed. But it might be a good idea to make this a standalone stage and thus creating a ctfcompress – a tool to (de)compress the CTF data. Usage would be very simple, just take a parameter, either -d or -c and a set of object files to perform the magic on.

Posted in Uncategorized | Tagged , | Leave a comment

Uniquification

Fancy word, don’t you think? Well, it is a fancy feature too!

Now let’s imagine this scenario: we have a kernel object full of ubiquitous types like uid_t, ushort, struct proc and so on. And with this kernel, we have a dozen of kernel modules built, mostly sharing numerous types with the kernel base. Needless to say, this can apply also to some userland application setups. Without any post-processing, both the modules and the kernel would have generated own CTF data. You can probably see now, that this causes unnecessary duplication. And that is exactly when uniquification comes into play.

It is done using the ctfmerge tool. Normally, we use it to take a set of objects and merge it into one with while making sure there is only one occurrence of each type. But this would not be enough, we still want to specify the parent file that we uniquify against. This saves a large portion of the space (currently, I have no estimate, there should be some post in the future with charts). Now, we need to realise this is not without problems – just imagine that we want to update the parent object, without even touching the children objects. Worry do not, there is one special switch in the ctfmerge that provides us the additive merge. One of the next blogposts will be exactly about this.

Looking for any parent references:

for path in `find /boot/kernel -maxdepth 1 -name '*.ko'`
do 
  ctfdump ${path} | grep "parname" 
done | sort | uniq

It seems that none of my kernel objects that were made during the CTF kernel build contain any references to the merging, simply the output of this command is either “cth_parname = (anon)” or a self-reference to the CTF header struct. Another evidence of this is simply looking at the kernel objects with the ctfdump – each one of them contains type definitions for common int or void.

After searching for usage of the ctfmerge in the system makefiles, I found that every time it is contained only as a body of an if which checks the MK_CTF variable for the “no” value. It seems that the file /usr/share/mk/bsd.own.mk contains the definition of this variable and sets it to be “no”. I will have to play some more with this.

Posted in Uncategorized | Tagged , , | Leave a comment

CTF header I.

Now that we have the CTF data at our disposal, we need to parse it and get meaningful information out of it. As usual, every format contains a header that consists of data-wide preferences and stuff like version or a starting magic number.

Indeed, the CTF starts exactly with a 16-bit magic number 0xCFF1. I suspect this should be some kind of pun, trying to recreate the name of the format. 0xCC1F would serve as much better number, since “CTF” actually stands for Compact C Type Format.

Next byte is representing the version of the format. Currently, versions 1 and 2 exist. Every one of my kernel objects (FreeBSD 11) that I checked has the version 2, therefore for the libctf I would prefer to start with the version 2 and maybe discard the version 1 as it is old and unused in the project.

Fourth byte is treated as place for 8 flags, even though I was able to find only one – the compress flag. This means, that the actual CTF data are zipped to save some space. One of these bits could be used to signal the endianess of the data – this feature was requested more than once by the community as a lacking feature in the current library.

The thing I noticed looking at the current implementation is that it is using types like short, char and so on. Maybe I am overcautious or not knowledgable enough, but these variables do not have a fixed size. OK, maybe the char is obviously always a byte, but it is not guaranteed for the short to be 2 bytes wide. I understand the reasons behind this weird C types, but sometimes it is really needed to be able to rely on this. As a consequence to this problem, a POSIX of some kind was issued to create a files inttypes.h and stdint.h that would contain types like uint8_t or int32_t (in fact, many others). I believe it is only a header file that is generated during some compiler initialisation to contain the correct typedefs. The realisation of the types aside, I would like to use them in my libctf implementation to assure the proper sizes in all situations.

Transforming this to a code will look something like this:

struct ctf_header_preface
{
  uint16_t magic;
  uint8_t version;
  uint8_t flags;
};

Also, we need to define constants representing the expected input:

#define CTF_MAGIC 0xCFF1
#define CTF_VERSION 2

/* flags */
#define CTF_COMPRESSED 1

To be continued.

Posted in Uncategorized | Tagged , , | Leave a comment

New relevant documentation and ideas

Pedro Giffuni send me some great reading material regarding the CTF. The links can be seen on the official wiki of the project.

Many thanks!

Posted in Uncategorized | Tagged , | Leave a comment

Hunting for the .SUNW_ctf

In order to actually do something with the CTF data, we must first obtain it. The location – if everything was compiled and converted appropriately – is the ELF section named “.SUNW_ctf”. Hence I started my today’s research by listing all the sections one by one and for each comparing the name stored in the ELF string table to our searched section name. After a successful match, we proceed to get hold of the actual data. Luckily, the libelf API is well designed and all this was really straightforward. The important part of the code:


while ((section = elf_nextscn(elf, section)) != 0)
{
  gelf_getshdr(section, &section_header);
  char *section_name = elf_strptr(elf, elf_header->e_shstrndx,
      section_header.sh_name);

  printf("%s\n", section_name);

  if (strcmp(section_name, ".SUNW_ctf") == 0)
  {
    Elf_Data *data = elf_getdata(section, NULL);
    for (unsigned int i = 0; i < data->d_size; i++)
      printf("%c", ((char*)data->d_buf)[i]);
  }
}

The whole code can be found here: elfctf.c (disclaimer: the code is nowhere near ideal state, there are no comments and no error checking, it is just a proof of concept).

After fixing the absence of the elements mentioned above, this code may be used in the rewrite of the CTF toolset, starting with the ctfdump. I would like to keep the libelf code out of the libctf for a simple-enough reason: keeping the library as light as possible when it comes to dependencies.

A small thought near the end: since there is no real connection anymore with the Sun Microsystems or Solaris, it might be suitable to rename this section to pure “.ctf”. There are not that many consumers of the CTF data that would need to change, most notably the D-Trace and the CTF toolset (ctfdump, ctfconvert and ctfmerge).

Posted in Uncategorized | Tagged , , | Leave a comment

My FreeBSD setup

For some time now, I have been using OS X on my MacBook happily as my web-browsing, film-watching, BZFlag-playing operating system. But for my developer needs, I use FreeBSD with occasional experimenting (or just portability checking) on OpenBSD.

I would like to share my FreeBSD setup with you.

First of all, it runs in the VirtualBox. In the headless mode. What does that mean? Simply put, there is no window and therefore the VirtualBox eats less resources. I have set up a port-forwarding rule on localhost, where TCP connections from host OS port 3022 are forwarded to the guest OS (in this case the FreeBSD) to port 22. By default, the FreeBSD installation runs the sshd. After everything is loaded, I just run ssh -p 3022 root@localhost and get nice terminal inside my iTerm2!

Posted in Uncategorized | Tagged | Leave a comment