Cloud Storage¶
bibtutils.gcp.storage¶
Functionality making use of GCP’s Cloud Storage.
See the official Cloud Storage Python Client documentation here: link.
- bibtutils.gcp.storage.create_bucket(project, bucket_name, location='US', credentials=None)[source]¶
Creates a Google Cloud Storage bucket in the specified project.
- Parameters:
project (
str
) – the project in which to create the bucket. The account being used must have “Storage Admin” rights on the GCP project.bucket_name (
str
) – the name of the bucket to create. Note that bucket names must be universally unique in GCP, and need to adhere to the GCS bucket naming guidelines: https://cloud.google.com/storage/docs/naming-bucketslocation ((Optional)
str
) – if specified, creates the dataset in the desired location/region. The locations and regions supported are listed in #locations_and_regions. if unspoecified https://cloud.google.com/bigquery/docs/locations defaults to US.credentials (
google.oauth2.credentials.Credentials
) – the credentials object to use when making the API call, if not to use the account running the function for authentication.
- Return type:
- Returns:
The bucket created during this function call.
- bibtutils.gcp.storage.read_gcs(bucket_name, blob_name, decode=True, credentials=None)[source]¶
Reads the contents of a blob from GCS. Service account must have (at least) read permissions on the bucket/blob.
Note that for extremely large files having
decode=True
can increase runtime substantially.from bibtutils.gcp.storage import read_gcs data = read_gcs('my_bucket', 'my_blob') print(data)
- Parameters:
bucket_name (
str
) – the bucket hosting the specified blob.blob_name (
str
) – the blob to read from GCS.decode (
bool
) – (Optional) whether or not to decode the blob contents into utf-8. Defaults toTrue
.credentials (
google.oauth2.credentials.Credentials
) – the credentials object to use when making the API call, if not to use the account running the function for authentication.
- Return type:
- Returns:
blob contents, decoded to utf-8.
- bibtutils.gcp.storage.read_gcs_nldjson(bucket_name, blob_name, **kwargs)[source]¶
Reads a blob in JSON NLD format from GCS and returns it as a list of dicts. Any extra arguments (
kwargs
) are passed to theread_gcs()
function.from bibtutils.gcp.storage import read_gcs_nldjson data = read_gcs_nldjson('my_bucket', 'my_nldjson_blob') print(item['favorite_color'] for item in data)
- bibtutils.gcp.storage.write_gcs(bucket_name, blob_name, data, mime_type='text/plain', create_bucket_if_not_found=False, timeout=60, credentials=None)[source]¶
Writes a String to GCS storage under a given blob name to the given bucket. The executing account must have (at least) write permissions to the bucket. If
data
is a str, will be encoded as utf-8 before uploading.from bibtutils.gcp.storage import write_gcs write_gcs('my_bucket', 'my_blob', data='my favorite color is blue')
- Parameters:
bucket_name (
str
) – the name of the bucket to which to write.blob_name (
str
) – the name of the blob to write.create_bucket_if_not_found (
bool
) – (Optional) ifTrue
, will attempt to create the bucket if it does not exist. Defaults toFalse
.credentials (
google.oauth2.credentials.Credentials
) – the credentials object to use when making the API call, if not to use the account running the function for authentication.content_type (
str
) – (Optional) the MIME type being uploaded. defaults to'text/plain'
.
- bibtutils.gcp.storage.write_gcs_nldjson(bucket_name, blob_name, json_data, add_date=False, **kwargs)[source]¶
Writes a dict to GCS storage under a given blob name to the given bucket. The executing account must have (at least) write permissions to the bucket. Use in conjunction with
upload_gcs_json()
to upload JSON data to BigQuery tables. Any extra arguments (kwargs
) are passed to thewrite_gcs()
function.from bibtutils.gcp.storage import write_gcs_nldjson write_gcs_nldjson( 'my_bucket', 'my_nldjson_blob', json_data=[ {'name': 'leo', 'favorite_color': 'red'}, {'name': 'matthew', 'favorite_color': 'blue'} ] )
- Parameters:
bucket_name (
str
) – the name of the bucket to which to write.blob_name (
str
) – the name of the blob to write.json_data (
list
ORdict
) – the data to be written. can be a list or a dict. will treat a dict as one row of data (and convert it to a one-item list). data will be converted to a JSON NLD formatted string before uploading for compatibility withupload_gcs_json()
.add_date (
bool
) – (Optional) whether or not to add upload date to the data before upload. Defaults toFalse
.create_bucket_if_not_found (
bool
) – (Optional) ifTrue
, will attempt to create the bucket if it does not exist. Defaults toFalse
.