>>> import ibis
>>> from ibis import _
>>> ibis.options.interactive = True
>>> t = ibis.memtable(
... [
... ("2020-04-15 08:04:00", 1),
... ("2020-04-15 08:06:00", 2),
... ("2020-04-15 08:09:00", 3),
... ("2020-04-15 08:11:00", 4),
... ],
... columns=["ts", "val"],
... ).cast({"ts": "timestamp"})TimestampValue
TimestampValue(arg)Attributes
| Name | Description |
|---|---|
| add | Add an interval to a timestamp. |
| radd | Add an interval to a timestamp. |
| sub | Subtract a timestamp or an interval from a timestamp. |
Methods
| Name | Description |
|---|---|
| bucket | Truncate the timestamp to buckets of a specified interval. |
| date | Return the date component of the expression. |
| delta | Compute the number of parts between two timestamps. |
| strftime | Format a timestamp according to format_str. |
| truncate | Truncate timestamp expression to units of unit. |
bucket
bucket(
interval=None,
*,
years=None,
quarters=None,
months=None,
weeks=None,
days=None,
hours=None,
minutes=None,
seconds=None,
milliseconds=None,
microseconds=None,
nanoseconds=None,
offset=None,
)Truncate the timestamp to buckets of a specified interval.
This is similar to truncate, but supports truncating to arbitrary intervals rather than a single unit. Buckets are computed as fixed intervals starting from the UNIX epoch. This origin may be offset by specifying offset.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| interval | Any | The bucket width as an interval. Alternatively may be specified via component keyword arguments. | None |
| years | int | None | Number of years | None |
| quarters | int | None | Number of quarters | None |
| months | int | None | Number of months | None |
| weeks | int | None | Number of weeks | None |
| days | int | None | Number of days | None |
| hours | int | None | Number of hours | None |
| minutes | int | None | Number of minutes | None |
| seconds | int | None | Number of seconds | None |
| milliseconds | int | None | Number of milliseconds | None |
| microseconds | int | None | Number of microseconds | None |
| nanoseconds | int | None | Number of nanoseconds | None |
| offset | Any | An interval to use to offset the start of the bucket. | None |
Returns
| Name | Type | Description |
|---|---|---|
| TimestampValue | The start of the bucket as a timestamp. |
Examples
Bucket the data into 5 minute wide buckets:
>>> t.ts.bucket(minutes=5)┏━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ TimestampBucket(ts, 5m) ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ timestamp │ ├─────────────────────────┤ │ 2020-04-15 08:00:00 │ │ 2020-04-15 08:05:00 │ │ 2020-04-15 08:05:00 │ │ 2020-04-15 08:10:00 │ └─────────────────────────┘
Bucket the data into 5 minute wide buckets, offset by 2 minutes:
>>> t.ts.bucket(minutes=5, offset=ibis.interval(minutes=2))┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ TimestampBucket(ts, 5m, 2m) ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ timestamp │ ├─────────────────────────────┤ │ 2020-04-15 08:02:00 │ │ 2020-04-15 08:02:00 │ │ 2020-04-15 08:07:00 │ │ 2020-04-15 08:07:00 │ └─────────────────────────────┘
One common use of timestamp bucketing is computing statistics per bucket. Here we compute the mean of val across 5 minute intervals:
>>> mean_by_bucket = (
... t.group_by(t.ts.bucket(minutes=5).name("bucket"))
... .agg(mean=_.val.mean())
... .order_by("bucket")
... )
>>> mean_by_bucket┏━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┓ ┃ bucket ┃ mean ┃ ┡━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━┩ │ timestamp │ float64 │ ├─────────────────────┼─────────┤ │ 2020-04-15 08:00:00 │ 1.0 │ │ 2020-04-15 08:05:00 │ 2.5 │ │ 2020-04-15 08:10:00 │ 4.0 │ └─────────────────────┴─────────┘
date
date()Return the date component of the expression.
Returns
| Name | Type | Description |
|---|---|---|
| DateValue | The date component of self |
delta
delta(other, part)Compute the number of parts between two timestamps.
The second argument is subtracted from the first.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| other | datetime.datetime | Value[dt.Timestamp] | A timestamp expression | required |
| part | Literal['year', 'quarter', 'month', 'week', 'day', 'hour', 'minute', 'second', 'millisecond', 'microsecond', 'nanosecond'] | Value[dt.String] | The unit of time to compute the difference in | required |
Returns
| Name | Type | Description |
|---|---|---|
| IntegerValue | The number of parts between self and other |
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> start = ibis.time("01:58:00")
>>> end = ibis.time("23:59:59")
>>> end.delta(start, unit="hour")┌────┐
│ 22 │
└────┘
>>> data = '''tpep_pickup_datetime,tpep_dropoff_datetime
... 2016-02-01T00:23:56,2016-02-01T00:42:28
... 2016-02-01T00:12:14,2016-02-01T00:21:41
... 2016-02-01T00:43:24,2016-02-01T00:46:14
... 2016-02-01T00:55:11,2016-02-01T01:24:34
... 2016-02-01T00:11:13,2016-02-01T00:16:59'''
>>> with open("/tmp/triptimes.csv", "w") as f:
... nbytes = f.write(data) # nbytes is unused
>>> taxi = ibis.read_csv("/tmp/triptimes.csv")
>>> ride_duration = taxi.tpep_dropoff_datetime.delta(
... taxi.tpep_pickup_datetime, unit="minute"
... ).name("ride_minutes")
>>> ride_duration┏━━━━━━━━━━━━━━┓ ┃ ride_minutes ┃ ┡━━━━━━━━━━━━━━┩ │ int64 │ ├──────────────┤ │ 19 │ │ 9 │ │ 3 │ │ 29 │ │ 5 │ └──────────────┘
strftime
strftime(format_str)Format a timestamp according to format_str.
Format string may depend on the backend, but we try to conform to ANSI strftime.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| format_str | str | strftime format string |
required |
Returns
| Name | Type | Description |
|---|---|---|
| StringValue | Formatted version of arg |
truncate
truncate(unit)Truncate timestamp expression to units of unit.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| unit | Literal['Y', 'Q', 'M', 'W', 'D', 'h', 'm', 's', 'ms', 'us', 'ns'] | Unit to truncate to | required |
Returns
| Name | Type | Description |
|---|---|---|
| TimestampValue | Truncated timestamp expression |