Skip to content
Snippets Groups Projects
Select Git revision
  • cb80a966b129bc1a3527ff862ed5e95d957cb252
  • main default protected
  • idacs_experiments
  • postgres_idacs
  • memgraph_fabian
  • postgres_julian
  • dev_ma
  • test_results
  • FZ_changed_code
  • FZ_Memgraph
10 results

sub_mem.py

Blame
  • README.md NaN GiB
    # postgrestutils
    
    A very basic PostgREST client and utils
    
    ## Usage
    
    The client is intended to be used by utilizing the `postgrestutils.Session` as a context manager.
    In fact it will not work if used otherwise because the underlying `requests.Session` is created only when it's used as a context manager.
    
    ```python
    import postgrestutils
    
    
    params = {
        "select": "id,forename",
        "forename": "eq.Jebediah"
    }
    
    # this will send a request to 'POSTGREST_UTILS_BASE_URI/kerbals?select=id,forename&forename=eq.Jebediah'
    with postgrestutils.Session() as s:
        res = s.filter('kerbal', params=params)
    ```
    
    By default constructing a new `postgrestutils.Session` will take the settings discussed in [setup](#setup) into account.
    Hence there is no need to specify `base_uri` or `token` explicitly unless you are using more than one API or database role in your project.
    
    Additionally `postgrestutils.Session` takes `schema: Optional[str] = None`, `parse_dt: bool = True` and `count: postgrestutils.Count = postgrestutils.Count.NONE` (some of which are explained later on).
    These options are session defaults and may be overridden on a per-request basis, e.g.
    
    ```python
    import postgrestutils
    
    
    with postgrestutils.Session(parse_dt=False) as s:
        print(s.get('random_datetime'))  # makes request using parse_dt=False
        print(s.get('random_datetime', parse_dt=True))  # makes request using parse_dt=True
        print(s.get('random_datetime'))  # makes request using parse_dt=False
    ```
    
    ### Setup
    
    Settings are either configured as django settings when `postgrestutils` is used in a django project or as environment variables otherwise.
    Django projects need to additionally add `'postgrestutils'` to the `INSTALLED_APPS` setting.
    
    #### Shared settings
    
    | Setting                   | Description
    |---------------------------|---------------------------------------------------------------
    |`POSTGREST_UTILS_BASE_URI` | base uri of the PostgREST instance to use
    |`POSTGREST_UTILS_JWT`      | JWT for the corresponding database role and PostgREST instance
    
    #### Django-only settings
    
    | Setting                   | Description
    |---------------------------|---------------------------------------------------------------
    |`POSTGREST_UTILS_AUTOFETCH`| account columns to fetch on login (comma-separated str)
    
    ### Making requests
    
    `postgrestutils` tries to be as intuitive and pythonic as possible while also being efficient.
    In order to do so it combines several concepts such as lazy evaluation, caching and counting strategies.
    
    Understanding these concepts allows you to write the most efficient code for your specific use case.
    If you're already familiar with how django querysets work this should feel fairly natural to you.
    There are however a few differences that will be explained in detail below.
    
    #### Lazy evaluation
    
    Akin to django querysets `postgrestutils` has a `JsonResultSet` that is returned from calls to `.filter()` without making any API calls yet.
    If you're familiar with django's rules for evaluation this list won't suprise you.
    Since there are a few subtle differences however here is what will cause evaluation of a `JsonResultSet`:
    
    - Iteration.
    A `JsonResultSet` is iterable and will fetch all elements from the API the first time you iterate over it.
    - Slicing.
    This will fetch the elements in the specified range.
    - `repr()`.
    As a convenience implementation for interactive interpreter sessions this will fetch the first 20 elements.
    - `len()`.
    Unsurprisingly this returns the count of the requested table.
    Depending on the [counting strategy](#counting-strategies) this has different implications such as the cache being populated.
    - `list()`.
    This can be useful to force evaluation of a `JsonResultSet`.
    - `bool()`.
    Using a `JsonResultSet` in any boolean context will evaluate it.
    - Using the `.get()` method on a session.
    Getting some lazy object when explicitly requesting a single element doesn't make much sense.
    Like django's `Model.objects.get()` this will return the requested element or raise a `postgrestutils.ObjectDoesNotExist`/`postgrestutils.MultipleObjectsReturned` if none or multiple objects were found.
    
    #### Pagination
    
    Remember the part about `postgrestutils` trying to be intuitive and pythonic?
    Do you know about Python's slicing notation?
    Great, you already know how pagination works.
    Just to be sure here is a snippet of pagination in action:
    
    ```python
    >>> with postgrestutils.Session() as s:
    ...     business_roles = s.filter('business_role')
    ...     business_roles[:3]  # fetches the first 3 business roles
    ...     business_roles[3:6]  # fetches the next 3 business_roles
    ```
    
    #### Caching
    
    Also like django querysets there is a cache to minimize API calls.
    Here's a short snippet explaining the gist of it:
    
    ```python
    # Bad: Fetches the same data from the API twice
    >>> with postgrestutils.Session() as s:
    ...     print([role['id'] for role in s.filter('business_role')])
    ...     print([role['name'] for role in s.filter('business_role')])
    
    # Good: Uses the cache resulting in only a single API request
    >>> with postgrestutils.Session() as s:
    ...     print([role['id'] for role in s.filter('business_role')])  # fetches all elements into the cache
    ...     print([role['name'] for role in s.filter('business_role')])  # re-uses the cached elements
    ```
    
    ##### When results are not cached
    
    There are a few cases where a `JsonResultSet` will not cache results:
    
    - Indexing and slicing.
    If the cache is not yet populated indexing and slicing - even on the same index/ranges - will result in an API call.
    
    ```python
    # without populated cache
    >>> with postgrestutils.Session() as s:
    ...     business_roles = s.filter('business_role')
    ...     business_roles[5]  # fetches the 6th element from the API
    ...     business_roles[5]  # fetches the 6th element from the API again
    
    # with populated cache
    >>> with postgrestutils.Session() as s:
    ...     business_roles = s.filter('business_role')
    ...     list(business_roles)  # fetches all elements from the API
    ...     business_roles[5]  # re-uses the cached elements
    ...     business_roles[5]  # re-uses the cached elements
    ```
    
    - `repr()`.
    Since this just returns a slice of itself the cache won't be populated.
    - `len()` when not using the `count=postgrestutils.Count.NONE` kwarg.
    Counting strategies other than `postgrestutils.Count.NONE` are not required to fetch all elements in order to determine their length.
    [More on counting strategies.](#counting-strategies)
    
    ##### A note on caching and `len()`
    
    Since calling `len()` can often be considerably expensive its result is also cached.
    Any subsequent calls will re-use the cache instead of making any API calls.
    
    
    ##### Invalidating the cache
    
    If you have a `JsonResultSet` around that you want to re-use but need up-to-date data simply call the `.refresh_from_pgrest()` method on it.
    That will lazily refresh data from PostgREST by invalidating the cache.
    Your object will now behave as if you just created it.
    
    #### Counting strategies
    
    PostgREST currently offers multiple [counting strategies](http://postgrest.org/en/stable/api.html#exact-count).
    `postgrestutils` lets you decide on which to use by specifying the `count` kwarg on a session or passing it on a per-request basis to `.get()` and `.filter()`.
    While this document attempts to explain counting strategies sufficiently consulting the linked PostgREST documentation may be insightful at times.
    
    ##### Using `count=postgrestutils.Count.NONE`
    
    If you don't need to know the count for your request this is obviously a good counting strategy to choose.
    But what happens if you need the count and just call `len()` on your `JsonResultSet` anyway?
    This is again similar to what django querysets do.
    It will evaluate the `JsonResultSet` fetching all elements from the API into the cache and return the length of the cache.
    
    ##### Using `count=postgrestutils.Count.EXACT`
    
    You've learned that `count=postgrestutils.Count.NONE` will count your elements just fine so why would you ever want to use this option?
    The reason is quite simple: Fetching all elements for a large table can be expensive; and unnecessarily so if you don't even need them.
    That's often the case when using pagination.
    You want to show a subset of all elements but also display how many pages with more elements there are.
    To do so you need the count of all elements and the first few elements depending on your page size.
    What you don't need however is all elements so why fetch them?
    
    This counting strategy allows you to get the count without fetching all elements.
    So what happens when calling `len()` on your `JsonResultSet` this time?
    `postgrestutils` will explicitly request the count for your request which will be cheaper for large tables.
    
    Be careful with this for very large tables however as this can take a very long time as explained in the [PostgREST documentation](http://postgrest.org/en/stable/admin.html#count-header-dos).
    
    ##### Using `count=postgrestutils.Count.PLANNED`
    
    Now what?
    Your table is very large, `postgrestutils.Count.EXACT` takes too long and `postgrestutils.Count.NONE` is out of question entirely.
    
    `postgrestutils.Count.PLANNED` to the rescue.
    Using this counting strategy you explicitly instruct the client to leverage PostgreSQL statistics collected from `ANALYZE`ing tables.
    This will yield fairly accurate results depending on how often new statistics are collected.
    
    ##### Using `count=postgrestutils.Count.ESTIMATED`
    
    So `postgrestutils.Count.NONE` and `postgrestutils.Count.EXACT` are feasible for small tables.
    For very large tables those either take too long or require too much memory and `postgrestutils.Count.ESTIMATED` is the only viable alternative.
    However `postgrestutils.Count.PLANNED` can potentially lead to deviations even for small tables where they are quite notable.
    If only we could have the best of both worlds...
    
    Enter `postgrestutils.Count.ESTIMATED`.
    The idea is quite simple: `postgrestutils.Count.ESTIMATED` uses the `postgrestutils.Count.EXACT` strategy up to a certain threshold then falls back to the `postgrestutils.Count.PLANNED` strategy.
    That threshold is defined by settings `max-rows` in your PostgREST configuration which will also limit the amount of rows fetched per request.
    
    #### Filtering
    http://postgrest.org/en/stable/api.html
    
    #### Schema switching
    
    As of v7.0.0 PostgREST has added schema switching support.
    By setting `db-schema` to a comma-separated string multiple schemas may be specified with the first schema being the default schema.
    Assuming your PostgREST instance has `db-schema = 'foo, bar'` in its configuration:
    
    ```python
    import postgrestutils
    
    
    with postgrestutils.Session(schema='bar') as s:
        s.filter()  # uses the 'bar' schema
        s.filter(, schema=postgrestutils.DEFAULT_SCHEMA)  # uses the default schema ('foo')
        s.filter(, schema='foo')  # explicitly request the 'foo' schema
        s.filter()  # uses the 'bar' schema
    ```
    
    ### Django helpers
    
    #### custom `user_account_fetched` signal
    
    `postgrestutils` provides a custom signal called `user_account_fetched` which provides the current request and the account of the current user on login.
    To use this feature (and you really should 99.9% of the time) configure your settings accordingly by specifying the columns you need from the `account` endpoint:
    
    ```python
    # settings.py
    POSTGREST_UTILS_AUTOFETCH = 'person_id,email'  # comma-separated string
    ```
    
    When connecting to the signal you will have to provide a callback function.
    That function is a great place for your custom logic to fetch and store additional information your app frequently needs into the session.
    You can connect to it like so:
    
    ```python
    # apps.py
    from django.apps import AppConfig
    from postgrestutils.signals import user_account_fetched
    from your_app.utils import your_callback_func
    
    
    class YourAppConfig(AppConfig):
        def ready(self, *args, **kwargs):
            super().ready(*args, **kwargs)
            user_account_fetched.connect(your_callback_func)
    ```
    
    Your callback function could look something like this:
    
    ```python
    # utils.py
    import postgrestutils
    
    
    def your_callback_func(sender, **kwargs):
        request = kwargs['request']
        account = kwargs['account']
        # fetch some addtional data your project needs frequently (person, memberships etc.)
        with postgrestutils.Session() as s:
            person = s.get()
            memberships = list(s.filter())
    
        # 'caching' that data in the session
        request.session['account'] = account
        request.session['person'] = person
        request.session['memberships'] = memberships
    ```
    
    For more information on signals refer to the django docs. They are great. Really.
    
    ### Tips and Tricks
    
    #### `functools.partial`
    
    When using different APIs or a common configuration for the `postgrestutils.Session` it might be annoying to repeat the same `kwargs` all the time.
    `functools.partial` can be used to pre-configure a common configuration in a suitable place of your project:
    
    ```python
    import functools
    import postgrestutils
    
    
    # Configure a foo_session for a different API your project uses
    foo_session = functools.partial(postgrestutils.Session, base_uri='https://another-api.com/', token='secret-token')
    
    # using the main API
    with postgrestutils.Session() as s:
        objs = list(s.filter())
    
    # using the other API
    with foo_session() as s:
        foo_objs = list(s.filter())
    ```
    
    ### Testing
    
    `postgrestutils` has a bunch of unittests because manually testing it has become quite time-consuming.
    The tests aim to ensure functional correctness as well as some performance related concerns i.e. caching and lazyness.
    It may also be a good place to look at when trying to understand the guarantees `postgrestutils` makes.
    With `requests-mock` installed, running all tests is as easy as `python -m unittest`.