-
Fynn Becker authoredFynn Becker authored
- postgrestutils
- Usage
- Setup
- Shared settings
- Django-only settings
- Making requests
- Lazy evaluation
- Pagination
- Caching
- When results are not cached
- A note on caching and len()
- Invalidating the cache
- Counting strategies
- Using count=postgrestutils.Count.NONE
- Using count=postgrestutils.Count.EXACT
- Using count=postgrestutils.Count.PLANNED
- Using count=postgrestutils.Count.ESTIMATED
- Filtering
- Schema switching
- Django helpers
- custom user_account_fetched signal
- Tips and Tricks
- functools.partial
- Testing
postgrestutils
A very basic PostgREST client and utils
Usage
The client is intended to be used by utilizing the postgrestutils.Session
as a context manager.
In fact it will not work if used otherwise because the underlying requests.Session
is created only when it's used as a context manager.
import postgrestutils
params = {
"select": "id,forename",
"forename": "eq.Jebediah"
}
# this will send a request to 'POSTGREST_UTILS_BASE_URI/kerbals?select=id,forename&forename=eq.Jebediah'
with postgrestutils.Session() as s:
res = s.filter('kerbal', params=params)
By default constructing a new postgrestutils.Session
will take the settings discussed in setup into account.
Hence there is no need to specify base_uri
or token
explicitly unless you are using more than one API or database role in your project.
Additionally postgrestutils.Session
takes schema: Optional[str] = None
, parse_dt: bool = True
and count: postgrestutils.Count = postgrestutils.Count.NONE
(some of which are explained later on).
These options are session defaults and may be overridden on a per-request basis, e.g.
import postgrestutils
with postgrestutils.Session(parse_dt=False) as s:
print(s.get('random_datetime')) # makes request using parse_dt=False
print(s.get('random_datetime', parse_dt=True)) # makes request using parse_dt=True
print(s.get('random_datetime')) # makes request using parse_dt=False
Setup
Settings are either configured as django settings when postgrestutils
is used in a django project or as environment variables otherwise.
Django projects need to additionally add 'postgrestutils'
to the INSTALLED_APPS
setting.
Shared settings
Setting | Description |
---|---|
POSTGREST_UTILS_BASE_URI |
base uri of the PostgREST instance to use |
POSTGREST_UTILS_JWT |
JWT for the corresponding database role and PostgREST instance |
Django-only settings
Setting | Description |
---|---|
POSTGREST_UTILS_AUTOFETCH |
account columns to fetch on login (comma-separated str) |
Making requests
postgrestutils
tries to be as intuitive and pythonic as possible while also being efficient.
In order to do so it combines several concepts such as lazy evaluation, caching and counting strategies.
Understanding these concepts allows you to write the most efficient code for your specific use case. If you're already familiar with how django querysets work this should feel fairly natural to you. There are however a few differences that will be explained in detail below.
Lazy evaluation
Akin to django querysets postgrestutils
has a JsonResultSet
that is returned from calls to .filter()
without making any API calls yet.
If you're familiar with django's rules for evaluation this list won't suprise you.
Since there are a few subtle differences however here is what will cause evaluation of a JsonResultSet
:
- Iteration.
A
JsonResultSet
is iterable and will fetch all elements from the API the first time you iterate over it. - Slicing. This will fetch the elements in the specified range.
-
repr()
. As a convenience implementation for interactive interpreter sessions this will fetch the first 20 elements. -
len()
. Unsurprisingly this returns the count of the requested table. Depending on the counting strategy this has different implications such as the cache being populated. -
list()
. This can be useful to force evaluation of aJsonResultSet
. -
bool()
. Using aJsonResultSet
in any boolean context will evaluate it. - Using the
.get()
method on a session. Getting some lazy object when explicitly requesting a single element doesn't make much sense. Like django'sModel.objects.get()
this will return the requested element or raise apostgrestutils.ObjectDoesNotExist
/postgrestutils.MultipleObjectsReturned
if none or multiple objects were found.
Pagination
Remember the part about postgrestutils
trying to be intuitive and pythonic?
Do you know about Python's slicing notation?
Great, you already know how pagination works.
Just to be sure here is a snippet of pagination in action:
>>> with postgrestutils.Session() as s:
... business_roles = s.filter('business_role')
... business_roles[:3] # fetches the first 3 business roles
... business_roles[3:6] # fetches the next 3 business_roles
Caching
Also like django querysets there is a cache to minimize API calls. Here's a short snippet explaining the gist of it:
# Bad: Fetches the same data from the API twice
>>> with postgrestutils.Session() as s:
... print([role['id'] for role in s.filter('business_role')])
... print([role['name'] for role in s.filter('business_role')])
# Good: Uses the cache resulting in only a single API request
>>> with postgrestutils.Session() as s:
... print([role['id'] for role in s.filter('business_role')]) # fetches all elements into the cache
... print([role['name'] for role in s.filter('business_role')]) # re-uses the cached elements
When results are not cached
There are a few cases where a JsonResultSet
will not cache results:
- Indexing and slicing. If the cache is not yet populated indexing and slicing - even on the same index/ranges - will result in an API call.
# without populated cache
>>> with postgrestutils.Session() as s:
... business_roles = s.filter('business_role')
... business_roles[5] # fetches the 6th element from the API
... business_roles[5] # fetches the 6th element from the API again
# with populated cache
>>> with postgrestutils.Session() as s:
... business_roles = s.filter('business_role')
... list(business_roles) # fetches all elements from the API
... business_roles[5] # re-uses the cached elements
... business_roles[5] # re-uses the cached elements
-
repr()
. Since this just returns a slice of itself the cache won't be populated. -
len()
when not using thecount=postgrestutils.Count.NONE
kwarg. Counting strategies other thanpostgrestutils.Count.NONE
are not required to fetch all elements in order to determine their length. More on counting strategies.
len()
A note on caching and Since calling len()
can often be considerably expensive its result is also cached.
Any subsequent calls will re-use the cache instead of making any API calls.
Invalidating the cache
If you have a JsonResultSet
around that you want to re-use but need up-to-date data simply call the .refresh_from_pgrest()
method on it.
That will lazily refresh data from PostgREST by invalidating the cache.
Your object will now behave as if you just created it.
Counting strategies
PostgREST currently offers multiple counting strategies.
postgrestutils
lets you decide on which to use by specifying the count
kwarg on a session or passing it on a per-request basis to .get()
and .filter()
.
While this document attempts to explain counting strategies sufficiently consulting the linked PostgREST documentation may be insightful at times.
count=postgrestutils.Count.NONE
Using If you don't need to know the count for your request this is obviously a good counting strategy to choose.
But what happens if you need the count and just call len()
on your JsonResultSet
anyway?
This is again similar to what django querysets do.
It will evaluate the JsonResultSet
fetching all elements from the API into the cache and return the length of the cache.
count=postgrestutils.Count.EXACT
Using You've learned that count=postgrestutils.Count.NONE
will count your elements just fine so why would you ever want to use this option?
The reason is quite simple: Fetching all elements for a large table can be expensive; and unnecessarily so if you don't even need them.
That's often the case when using pagination.
You want to show a subset of all elements but also display how many pages with more elements there are.
To do so you need the count of all elements and the first few elements depending on your page size.
What you don't need however is all elements so why fetch them?
This counting strategy allows you to get the count without fetching all elements.
So what happens when calling len()
on your JsonResultSet
this time?
postgrestutils
will explicitly request the count for your request which will be cheaper for large tables.
Be careful with this for very large tables however as this can take a very long time as explained in the PostgREST documentation.
count=postgrestutils.Count.PLANNED
Using Now what?
Your table is very large, postgrestutils.Count.EXACT
takes too long and postgrestutils.Count.NONE
is out of question entirely.
postgrestutils.Count.PLANNED
to the rescue.
Using this counting strategy you explicitly instruct the client to leverage PostgreSQL statistics collected from ANALYZE
ing tables.
This will yield fairly accurate results depending on how often new statistics are collected.
count=postgrestutils.Count.ESTIMATED
Using So postgrestutils.Count.NONE
and postgrestutils.Count.EXACT
are feasible for small tables.
For very large tables those either take too long or require too much memory and postgrestutils.Count.ESTIMATED
is the only viable alternative.
However postgrestutils.Count.PLANNED
can potentially lead to deviations even for small tables where they are quite notable.
If only we could have the best of both worlds...
Enter postgrestutils.Count.ESTIMATED
.
The idea is quite simple: postgrestutils.Count.ESTIMATED
uses the postgrestutils.Count.EXACT
strategy up to a certain threshold then falls back to the postgrestutils.Count.PLANNED
strategy.
That threshold is defined by settings max-rows
in your PostgREST configuration which will also limit the amount of rows fetched per request.
Filtering
http://postgrest.org/en/stable/api.html
Schema switching
As of v7.0.0 PostgREST has added schema switching support.
By setting db-schema
to a comma-separated string multiple schemas may be specified with the first schema being the default schema.
Assuming your PostgREST instance has db-schema = 'foo, bar'
in its configuration:
import postgrestutils
with postgrestutils.Session(schema='bar') as s:
s.filter(…) # uses the 'bar' schema
s.filter(…, schema=postgrestutils.DEFAULT_SCHEMA) # uses the default schema ('foo')
s.filter(…, schema='foo') # explicitly request the 'foo' schema
s.filter(…) # uses the 'bar' schema
Django helpers
user_account_fetched
signal
custom postgrestutils
provides a custom signal called user_account_fetched
which provides the current request and the account of the current user on login.
To use this feature (and you really should 99.9% of the time) configure your settings accordingly by specifying the columns you need from the account
endpoint:
# settings.py
POSTGREST_UTILS_AUTOFETCH = 'person_id,email' # comma-separated string
When connecting to the signal you will have to provide a callback function. That function is a great place for your custom logic to fetch and store additional information your app frequently needs into the session. You can connect to it like so:
# apps.py
from django.apps import AppConfig
from postgrestutils.signals import user_account_fetched
from your_app.utils import your_callback_func
class YourAppConfig(AppConfig):
def ready(self, *args, **kwargs):
super().ready(*args, **kwargs)
user_account_fetched.connect(your_callback_func)
Your callback function could look something like this:
# utils.py
import postgrestutils
def your_callback_func(sender, **kwargs):
request = kwargs['request']
account = kwargs['account']
# fetch some addtional data your project needs frequently (person, memberships etc.)
with postgrestutils.Session() as s:
person = s.get(…)
memberships = list(s.filter(…))
# 'caching' that data in the session
request.session['account'] = account
request.session['person'] = person
request.session['memberships'] = memberships
For more information on signals refer to the django docs. They are great. Really.
Tips and Tricks
functools.partial
When using different APIs or a common configuration for the postgrestutils.Session
it might be annoying to repeat the same kwargs
all the time.
functools.partial
can be used to pre-configure a common configuration in a suitable place of your project:
import functools
import postgrestutils
# Configure a foo_session for a different API your project uses
foo_session = functools.partial(postgrestutils.Session, base_uri='https://another-api.com/', token='secret-token')
# using the main API
with postgrestutils.Session() as s:
objs = list(s.filter(…))
# using the other API
with foo_session() as s:
foo_objs = list(s.filter(…))
Testing
postgrestutils
has a bunch of unittests because manually testing it has become quite time-consuming.
The tests aim to ensure functional correctness as well as some performance related concerns i.e. caching and lazyness.
It may also be a good place to look at when trying to understand the guarantees postgrestutils
makes.
With requests-mock
installed, running all tests is as easy as python -m unittest
.