9 min read

Instantiating a GraalPy (GraalVM) Project with Gradle to Access the Beancount `pip` Package in Java

Guide on setting up GraalVM's polyglot interfacing capabilities with Python and `pip` packages.

Some time ago I stumbled upon a use case that required me to process some data from Beancount files. These files follow the Plain Text Accounting philosophy for personal finances. The Beancount project itself is written in Python, but I wanted to access my data via the Java platform. There were always ways to do this, but in the past couple of years the concept of polyglot programming has gained much traction in the Java community because of the advance of GraalVM.

This guide will teach you how to set up GraalVM and GraalPy on your Linux or macOS machine (I am using WSL). After this, we create a Gradle project with the Native Build Tools plugin. When the setup is completed, we will install the Beancount package using pip and call a (Python) library function from our Java code.

Installing GraalVM and GraalPy

There exist multiple ways to install the GraalVM runtime environment. Because of the otherwise complicated setup process, I have chosen to use SDKMAN! and pyenv. Please follow the installation instructions on their respective pages.

First, we want to install GraalVM itself. We can do this using SDKMAN!. Please ensure that you install the distribution that conforms to your licence requirements. I will be referring to the community editions in this article.

sdk install java 21.0.1-graalce

After this is completed, we can move onto installing GraalPy. This Python 3.10 compliant runtime makes polyglot programming possible.

pyenv install graalpy-community-23.1.0

To access the GraalPy environment we can use pyenv shell graalpy-community-23.1.0.

You can verify the installations by running the following commands:

$ java --version
java 21.0.1 2023-10-17
Java(TM) SE Runtime Environment Oracle GraalVM 21.0.1+12.1 (build 21.0.1+12-jvmci-23.1-b19)
Java HotSpot(TM) 64-Bit Server VM Oracle GraalVM 21.0.1+12.1 (build 21.0.1+12-jvmci-23.1-b19, mixed mode, sharing)
$ pyenv shell graalpy-community-23.1.0
$ graalpy --version
GraalPy 3.10.8 (GraalVM CE Native 23.1.0)

Creating a Virtual Environment

We can now create a virtual environment to store our dependencies:

graalpy -m venv venv

In this example our virtual environment is simply called venv.

To use pip and other commands in this virtual environment, we need to "activate" it. This can be done by sourcing the correct shell script for your platform, in our case: source ./venv/bin/activate.

Instantiating Your Gradle Project

You can instantiate your Gradle anyway you like, or reuse an existing project.

This can be done using gradle init.

The tutorial will assume a Java application project using the Kotlin DSL.

First of all, we want to make sure we have the application and Native Build Tools plugins installed:

plugins {
  application
  id("org.graalvm.buildtools.native") version "0.9.28"
}

We also need to define the following dependencies1:

dependencies {
  implementation("org.graalvm.polyglot:polyglot:23.1.0")
  runtimeOnly("org.graalvm.polyglot:python-community:23.1.0")
  runtimeOnly("org.graalvm.polyglot:llvm-community:23.1.0")
}

We can define our main class using the application plugin:

application {
  mainClass.set("org.example.Main")
}

Installing Beancount

When I first tried to install Beancount using pip install beancount==2.3.6 I got the following error:

Compile failed: command 'cc' failed: No such file or directory
    ...
    creating tmp
    cc -I/usr/include/libxml2 -I/usr/include/libxml2 -c /tmp/xmlXPathInitsdjoqzb0.c -o tmp/xmlXPathInitsdjoqzb0.o
    *********************************************************************************
    Could not find function xmlCheckVersion in library libxml2. Is libxml2 installed?
    *********************************************************************************
    error: command 'cc' failed: No such file or directory
    [end of output]
  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for lxml
Failed to build lxml
ERROR: Could not build wheels for lxml, which is required to install pyproject.toml-based projects

In this case I was missing cc. On Ubuntu this is included with the build-essential package. For good measures, I made sure to install all packages that might be necessary to install lxml:

sudo apt-get install libxml2-dev libxslt-dev pyhton3-dev build-essential

Different system dependencies might be necessary, depending on your operating system and the package you are trying to install. Always carefully read the error messages produced by pip install, and check the package documentation to see if you are missing any prerequisites.

It is important to note that installing packages in your virtual environment might take longer than you're used to. This is because the environment also checks if any patches need to be made to the package or its (sub)dependencies to make it cooperate correctly with GraalPy.

The Testcase

To keep it simple, in Beancount, a ledger is the combination of a set of files that contain certain directives.

If this concept is something you want to investigate further, you might be interested in the Getting Started with Beancount article by the creator of Beancount. Another good starting point is plaintextaccounting.org.

One of these directives is the commodity directive. This directive defines a commodity (e.g., currency or stock) that can be referenced in the rest of your ledger.

In our example, we want to read a ledger, extract its directives, and find all commodity symbols (e.g, EUR).

test.beancount

2023-12-12 commodity EUR
2023-12-12 commodity GBP
2023-12-12 commodity USD
2023-12-12 commodity CHF

In this example we define the commodities EUR, GBP, USD, and CHF.

Our goal is to create a String[] with these symbols.

In Python these symbols can be found programmatically using the following script:

from beancount import loader
from beancount.core.data import Commodity

# In this tuple, `entries` is a list of directives
(entries, errors, option_map) = loader.load_file('test.beancount')

symbols = [entry.currency for entry in entries if isinstance(entry, Commodity)]

print(symbols) # Prints ['EUR', 'GBP', 'USD', 'CHF']

Folder Structure for Our Example

Our project structure should now look somewhat similar to the following:

.
├── build.gradle.kts
├── gradle
│   └── wrapper
│       ├── gradle-wrapper.jar
│       └── gradle-wrapper.properties
├── gradlew
├── gradlew.bat
├── settings.gradle.kts
├── src
│   ├── main
│   │   ├── java
│   │   │   └── org
│   │   │       └── example
│   │   │           └── Main.java
├── test.beancount
└── venv
    ├── bin
    │   ├── activate
    │   ├── bean-check
    │   ├── pip
    │   └── ...
    ├── include
    ├── lib
    │   └── python3.10
    │       └── site-packages
    │           ├── __pycache__
    │           ├── beancount
    │           │   ├── __init__.py
    │           │   └── ...
    │           └── ...
    └── ...

If this is the case, we can start work on implementing the final solution.

Implementing Our logic in Java

When everything is set up correctly, it is trivial to call this exact code in Java. We are going to use the GraalVM Polyglot API to run our Python code in the JVM.

We start by creating a Context object, building these objects can be done by using the Context.Builder builder class:

final Context.Builder graal = Context.newBuilder("python", "llvm")
  .option("python.Executable", "venv/bin/graalpython")
  .option("python.ForceImportSite", "true")
  .allowIO(IOAccess.ALL)
  .allowNativeAccess(true);
try (Context ctx = graal.build()) {
  // Your code
}

Within the try-catch block we can use the Context to interact with our guest languages (in our case these are Python and LLVM).

We need to allow native access because the Beancount package internally uses the struct (and thus pack) builtins. These builtins are implemented as a C module, which requires native access on top of IO access. Otherwise, we will get the following error: ImportError: cannot import name 'pack' from 'struct'.

To make calls to our guest language (Python), we can use the Context#eval(String, String) method:

// First, import the required packages
ctx.eval("python", """
    from beancount import loader
    from beancount.core.data import Commodity
    """);
// Then, load the ledger
ctx.eval("python", "(entries, errors, option_map) = loader.load_file('test.beancount')");
// Find all symbols
final Value pySymbols = ctx.eval("python", "[entry.currency for entry in entries if isinstance(entry, Commodity)]");
// Convert them to their Java representation
final String[] symbols = pySymbols.as(String[].class);
// Finally, print the array (in the Java world)
System.out.println(Arrays.toString(symbols));

Our full solution now looks like this:

/src/main/java/org/example/Main.java

package org.example;

import org.graalvm.polyglot.Context;
import org.graalvm.polyglot.Value;
import org.graalvm.polyglot.io.IOAccess;

import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.Arrays;

public class Main {
  public static void main(String[] args) {
    final Context.Builder graal = Context.newBuilder("python", "llvm")
          .option("python.Executable", "venv/bin/graalpython")
          .option("python.ForceImportSite", "true")
          .allowIO(IOAccess.ALL)
          .allowNativeAccess(true);
    try (Context ctx = graal.build()) {
      ctx.eval("python", """
            from beancount import loader
            from beancount.core.data import Commodity
            """);
      ctx.eval("python", "(entries, errors, option_map) = loader.load_file('test.beancount')");

      final Value pySymbols = ctx.eval("python", "[entry.currency for entry in entries if isinstance(entry, Commodity)]");
      final String[] symbols = pySymbols.as(String[].class);

      System.out.println(Arrays.toString(symbols));
    }
  }
}

This prints [EUR, GBP, USD, CHF], exactly what we want.

Verify that you are running the application using the executable built with the nativeBuild task, or that it is run using the nativeRun task. The Native Image Tools plugin offers an assortment of tasks. If you use different tasks (e.g., run), you will get an error because you are running the application like a "normal" Java application. This does not offer the polyglot programming capabilities offered by GraalVM.

Build Time

On my relatively powerful machine, it takes almost 8 minutes to compile this example.

When needing to interface with Python fast and securely, there can be advantages to using GraalPy and GraalVM, but techniques should be applied to make sure development time is not impacted by these long compilation times.

Conclusion

In this post, we learned how to set up GraalVM and its Python interface (GraalPy) with the Gradle build tool.

We looked at how to install packages using the pip package manager, and identified some ways to resolve issues as they occur during package installation in the context of GraalVM.

The polyglot programming capabilities of GraalVM are very powerful when used correctly. It is, however, sometimes challenging to find the right way to set up projects that make use of certain features.

What can these polyglot capabilities and powerful interfacing techniques bring to your business or project?

The example project is available as a repository on GitHub. The repository also includes a Docker image that shows how we can leverage GraalVM, GraalPy, and pip in containerized environments.

Footnotes

  1. The LLVM language dependency is necessary for Beancount specifically, your project might not need it.

Subscribe to the RSS iconRSS feed for more content like this.

The contents of this article are licensed under the CC BY-NC-SA 4.0 license.