Skip to content
Snippets Groups Projects
Select Git revision
  • 802e38922d9c8a8e66a688f55e120c33d47d34c7
  • master default protected
  • date-parsing
  • v2.2.0
  • v2.1.0
  • v2.0.0
  • v1.0.0
  • v0.1.0
8 results

README.md

Blame
  • postgrestutils

    A very basic PostgREST client and utils

    Usage

    The client is intended to be used by utilizing the postgrestutils.Session as a context manager. In fact it will not work if used otherwise because the underlying requests.Session is created only when it's used as a context manager.

    import postgrestutils
    
    
    params = {
        "select": "id,forename",
        "forename": "eq.Jebediah"
    }
    
    # this will send a request to 'POSTGREST_UTILS_BASE_URI/kerbals?select=id,forename&forename=eq.Jebediah'
    with postgrestutils.Session() as s:
        res = s.filter('kerbal', params=params)

    By default constructing a new postgrestutils.Session will take the settings discussed in setup into account. Hence there is no need to specify base_uri or token explicitly unless you are using more than one API or database role in your project.

    Additionally postgrestutils.Session takes schema: Optional[str] = None, parse_dt: bool = True and count: postgrestutils.Count = postgrestutils.Count.NONE (some of which are explained later on). These options are session defaults and may be overridden on a per-request basis, e.g.

    import postgrestutils
    
    
    with postgrestutils.Session(parse_dt=False) as s:
        print(s.get('random_datetime'))  # makes request using parse_dt=False
        print(s.get('random_datetime', parse_dt=True))  # makes request using parse_dt=True
        print(s.get('random_datetime'))  # makes request using parse_dt=False

    Setup

    Settings are either configured as django settings when postgrestutils is used in a django project or as environment variables otherwise. Django projects need to additionally add 'postgrestutils' to the INSTALLED_APPS setting.

    Shared settings

    Setting Description
    POSTGREST_UTILS_BASE_URI base uri of the PostgREST instance to use
    POSTGREST_UTILS_JWT JWT for the corresponding database role and PostgREST instance

    Django-only settings

    Setting Description
    POSTGREST_UTILS_AUTOFETCH account columns to fetch on login (comma-separated str)

    Making requests

    postgrestutils tries to be as intuitive and pythonic as possible while also being efficient. In order to do so it combines several concepts such as lazy evaluation, caching and counting strategies.

    Understanding these concepts allows you to write the most efficient code for your specific use case. If you're already familiar with how django querysets work this should feel fairly natural to you. There are however a few differences that will be explained in detail below.

    Lazy evaluation

    Akin to django querysets postgrestutils has a JsonResultSet that is returned from calls to .filter() without making any API calls yet. If you're familiar with django's rules for evaluation this list won't suprise you. Since there are a few subtle differences however here is what will cause evaluation of a JsonResultSet:

    • Iteration. A JsonResultSet is iterable and will fetch all elements from the API the first time you iterate over it.
    • Slicing. This will fetch the elements in the specified range.
    • repr(). As a convenience implementation for interactive interpreter sessions this will fetch the first 20 elements.
    • len(). Unsurprisingly this returns the count of the requested table. Depending on the counting strategy this has different implications such as the cache being populated.
    • list(). This can be useful to force evaluation of a JsonResultSet.
    • bool(). Using a JsonResultSet in any boolean context will evaluate it.
    • Using the .get() method on a session. Getting some lazy object when explicitly requesting a single element doesn't make much sense. Like django's Model.objects.get() this will return the requested element or raise a postgrestutils.ObjectDoesNotExist/postgrestutils.MultipleObjectsReturned if none or multiple objects were found.

    Pagination

    Remember the part about postgrestutils trying to be intuitive and pythonic? Do you know about Python's slicing notation? Great, you already know how pagination works. Just to be sure here is a snippet of pagination in action:

    >>> with postgrestutils.Session() as s:
    ...     business_roles = s.filter('business_role')
    ...     business_roles[:3]  # fetches the first 3 business roles
    ...     business_roles[3:6]  # fetches the next 3 business_roles

    Caching

    Also like django querysets there is a cache to minimize API calls. Here's a short snippet explaining the gist of it:

    # Bad: Fetches the same data from the API twice
    >>> with postgrestutils.Session() as s:
    ...     print([role['id'] for role in s.filter('business_role')])
    ...     print([role['name'] for role in s.filter('business_role')])
    
    # Good: Uses the cache resulting in only a single API request
    >>> with postgrestutils.Session() as s:
    ...     print([role['id'] for role in s.filter('business_role')])  # fetches all elements into the cache
    ...     print([role['name'] for role in s.filter('business_role')])  # re-uses the cached elements
    When results are not cached

    There are a few cases where a JsonResultSet will not cache results:

    • Indexing and slicing. If the cache is not yet populated indexing and slicing - even on the same index/ranges - will result in an API call.
    # without populated cache
    >>> with postgrestutils.Session() as s:
    ...     business_roles = s.filter('business_role')
    ...     business_roles[5]  # fetches the 6th element from the API
    ...     business_roles[5]  # fetches the 6th element from the API again
    
    # with populated cache
    >>> with postgrestutils.Session() as s:
    ...     business_roles = s.filter('business_role')
    ...     list(business_roles)  # fetches all elements from the API
    ...     business_roles[5]  # re-uses the cached elements
    ...     business_roles[5]  # re-uses the cached elements
    • repr(). Since this just returns a slice of itself the cache won't be populated.
    • len() when not using the count=postgrestutils.Count.NONE kwarg. Counting strategies other than postgrestutils.Count.NONE are not required to fetch all elements in order to determine their length. More on counting strategies.
    A note on caching and len()

    Since calling len() can often be considerably expensive its result is also cached. Any subsequent calls will re-use the cache instead of making any API calls.

    Invalidating the cache

    If you have a JsonResultSet around that you want to re-use but need up-to-date data simply call the .refresh_from_pgrest() method on it. That will lazily refresh data from PostgREST by invalidating the cache. Your object will now behave as if you just created it.

    Counting strategies

    PostgREST currently offers multiple counting strategies. postgrestutils lets you decide on which to use by specifying the count kwarg on a session or passing it on a per-request basis to .get() and .filter(). While this document attempts to explain counting strategies sufficiently consulting the linked PostgREST documentation may be insightful at times.

    Using count=postgrestutils.Count.NONE

    If you don't need to know the count for your request this is obviously a good counting strategy to choose. But what happens if you need the count and just call len() on your JsonResultSet anyway? This is again similar to what django querysets do. It will evaluate the JsonResultSet fetching all elements from the API into the cache and return the length of the cache.

    Using count=postgrestutils.Count.EXACT

    You've learned that count=postgrestutils.Count.NONE will count your elements just fine so why would you ever want to use this option? The reason is quite simple: Fetching all elements for a large table can be expensive; and unnecessarily so if you don't even need them. That's often the case when using pagination. You want to show a subset of all elements but also display how many pages with more elements there are. To do so you need the count of all elements and the first few elements depending on your page size. What you don't need however is all elements so why fetch them?

    This counting strategy allows you to get the count without fetching all elements. So what happens when calling len() on your JsonResultSet this time? postgrestutils will explicitly request the count for your request which will be cheaper for large tables.

    Be careful with this for very large tables however as this can take a very long time as explained in the PostgREST documentation.

    Using count=postgrestutils.Count.PLANNED

    Now what? Your table is very large, postgrestutils.Count.EXACT takes too long and postgrestutils.Count.NONE is out of question entirely.

    postgrestutils.Count.PLANNED to the rescue. Using this counting strategy you explicitly instruct the client to leverage PostgreSQL statistics collected from ANALYZEing tables. This will yield fairly accurate results depending on how often new statistics are collected.

    Using count=postgrestutils.Count.ESTIMATED

    So postgrestutils.Count.NONE and postgrestutils.Count.EXACT are feasible for small tables. For very large tables those either take too long or require too much memory and postgrestutils.Count.ESTIMATED is the only viable alternative. However postgrestutils.Count.PLANNED can potentially lead to deviations even for small tables where they are quite notable. If only we could have the best of both worlds...

    Enter postgrestutils.Count.ESTIMATED. The idea is quite simple: postgrestutils.Count.ESTIMATED uses the postgrestutils.Count.EXACT strategy up to a certain threshold then falls back to the postgrestutils.Count.PLANNED strategy. That threshold is defined by settings max-rows in your PostgREST configuration which will also limit the amount of rows fetched per request.

    Filtering

    http://postgrest.org/en/stable/api.html

    Schema switching

    As of v7.0.0 PostgREST has added schema switching support. By setting db-schema to a comma-separated string multiple schemas may be specified with the first schema being the default schema. Assuming your PostgREST instance has db-schema = 'foo, bar' in its configuration:

    import postgrestutils
    
    
    with postgrestutils.Session(schema='bar') as s:
        s.filter()  # uses the 'bar' schema
        s.filter(, schema=postgrestutils.DEFAULT_SCHEMA)  # uses the default schema ('foo')
        s.filter(, schema='foo')  # explicitly request the 'foo' schema
        s.filter()  # uses the 'bar' schema

    Django helpers

    custom user_account_fetched signal

    postgrestutils provides a custom signal called user_account_fetched which provides the current request and the account of the current user on login. To use this feature (and you really should 99.9% of the time) configure your settings accordingly by specifying the columns you need from the account endpoint:

    # settings.py
    POSTGREST_UTILS_AUTOFETCH = 'person_id,email'  # comma-separated string

    When connecting to the signal you will have to provide a callback function. That function is a great place for your custom logic to fetch and store additional information your app frequently needs into the session. You can connect to it like so:

    # apps.py
    from django.apps import AppConfig
    from postgrestutils.signals import user_account_fetched
    from your_app.utils import your_callback_func
    
    
    class YourAppConfig(AppConfig):
        def ready(self, *args, **kwargs):
            super().ready(*args, **kwargs)
            user_account_fetched.connect(your_callback_func)

    Your callback function could look something like this:

    # utils.py
    import postgrestutils
    
    
    def your_callback_func(sender, **kwargs):
        request = kwargs['request']
        account = kwargs['account']
        # fetch some addtional data your project needs frequently (person, memberships etc.)
        with postgrestutils.Session() as s:
            person = s.get()
            memberships = list(s.filter())
    
        # 'caching' that data in the session
        request.session['account'] = account
        request.session['person'] = person
        request.session['memberships'] = memberships

    For more information on signals refer to the django docs. They are great. Really.

    Tips and Tricks

    functools.partial

    When using different APIs or a common configuration for the postgrestutils.Session it might be annoying to repeat the same kwargs all the time. functools.partial can be used to pre-configure a common configuration in a suitable place of your project:

    import functools
    import postgrestutils
    
    
    # Configure a foo_session for a different API your project uses
    foo_session = functools.partial(postgrestutils.Session, base_uri='https://another-api.com/', token='secret-token')
    
    # using the main API
    with postgrestutils.Session() as s:
        objs = list(s.filter())
    
    # using the other API
    with foo_session() as s:
        foo_objs = list(s.filter())

    Testing

    postgrestutils has a bunch of unittests because manually testing it has become quite time-consuming. The tests aim to ensure functional correctness as well as some performance related concerns i.e. caching and lazyness. It may also be a good place to look at when trying to understand the guarantees postgrestutils makes. With requests-mock installed, running all tests is as easy as python -m unittest.