All string operations are valid for both scalars and columns.
StringValue
Methods
Name | Description |
---|
ascii_str | Return the numeric ASCII code of the first character of a string. |
authority | Parse a URL and extract authority. |
capitalize | Uppercase the first letter, lowercase the rest. |
concat | Concatenate strings. |
contains | Return whether the expression contains substr . |
endswith | Determine if self ends with end . |
find | Return the position of the first occurrence of substring. |
find_in_set | Find the first occurrence of str_list within a list of strings. |
fragment | Parse a URL and extract fragment identifier. |
host | Parse a URL and extract host. |
length | Compute the length of a string. |
levenshtein | Return the Levenshtein distance between two strings. |
lower | Convert string to all lowercase. |
lpad | Pad arg by truncating on the right or padding on the left. |
lstrip | Remove whitespace from the left side of string. |
path | Parse a URL and extract path. |
protocol | Parse a URL and extract protocol. |
query | Parse a URL and returns query string or query string parameter. |
re_extract | Return the specified match at index from a regex pattern . |
re_replace | Replace all matches found by regex pattern with replacement . |
re_search | Return whether the values match pattern . |
re_split | Split a string by a regular expression pattern . |
repeat | Repeat a string n times. |
replace | Replace each exact match of pattern with replacement . |
reverse | Reverse the characters of a string. |
right | Return up to nchars from the end of each string. |
rpad | Pad self by truncating or padding on the right. |
rstrip | Remove whitespace from the right side of string. |
split | Split as string on delimiter . |
startswith | Determine whether self starts with start . |
strip | Remove whitespace from left and right sides of a string. |
substr | Extract a substring. |
to_date | |
translate | Replace from_str characters in self characters in to_str . |
upper | Convert string to all uppercase. |
userinfo | Parse a URL and extract user info. |
ascii_str
Return the numeric ASCII code of the first character of a string.
Returns
Name | Type | Description |
---|
| IntegerValue | ASCII code of the first character of the input |
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"s": ["abc", "def", "ghi"]})
>>> t.s.ascii_str()
┏━━━━━━━━━━━━━━━━┓
┃ StringAscii(s) ┃
┡━━━━━━━━━━━━━━━━┩
│ int32 │
├────────────────┤
│ 97 │
│ 100 │
│ 103 │
└────────────────┘
authority
Parse a URL and extract authority.
Examples
>>> import ibis
>>> url = ibis.literal("https://user:pass@example.com:80/docs/books")
>>> result = url.authority() # user:pass@example.com:80
Returns
capitalize
Uppercase the first letter, lowercase the rest.
This API matches the semantics of the Python
method.
Returns
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"s": ["aBC", " abc", "ab cd", None]})
>>> t.s.capitalize()
┏━━━━━━━━━━━━━━━┓
┃ Capitalize(s) ┃
┡━━━━━━━━━━━━━━━┩
│ string │
├───────────────┤
│ Abc │
│ abc │
│ Ab cd │
│ NULL │
└───────────────┘
concat
Concatenate strings.
NULLs are propagated. This methods is equivalent to using the +
operator.
Parameters
Returns
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"s": ["abc", None]})
>>> t.s.concat("xyz", "123")
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ StringConcat((s, 'xyz', '123')) ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ string │
├─────────────────────────────────┤
│ abcxyz123 │
│ NULL │
└─────────────────────────────────┘
>>> t.s + "xyz"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ StringConcat((s, 'xyz')) ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ string │
├──────────────────────────┤
│ abcxyz │
│ NULL │
└──────────────────────────┘
contains
Return whether the expression contains substr
.
Parameters
Name | Type | Description | Default |
---|
substr | str | StringValue | Substring for which to check | required |
Returns
Name | Type | Description |
---|
| BooleanValue | Boolean indicating the presence of substr in the expression |
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"s": ["bab", "ddd", "eaf"]})
>>> t.s.contains("a")
┏━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ StringContains(s, 'a') ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━┩
│ boolean │
├────────────────────────┤
│ True │
│ False │
│ True │
└────────────────────────┘
endswith
Determine if self
ends with end
.
Parameters
Name | Type | Description | Default |
---|
end | str | StringValue | Suffix to check for | required |
Returns
Name | Type | Description |
---|
| BooleanValue | Boolean indicating whether self ends with end |
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"s": ["Ibis project", "GitHub"]})
>>> t.s.endswith("project")
┏━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ EndsWith(s, 'project') ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━┩
│ boolean │
├────────────────────────┤
│ True │
│ False │
└────────────────────────┘
find
find(substr, start=None, end=None)
Return the position of the first occurrence of substring.
Parameters
Name | Type | Description | Default |
---|
substr | str | StringValue | Substring to search for | required |
start | int | ir.IntegerValue | None | Zero based index of where to start the search | None |
end | int | ir.IntegerValue | None | Zero based index of where to stop the search. Currently not implemented. | None |
Returns
Name | Type | Description |
---|
| IntegerValue | Position of substr in arg starting from start |
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"s": ["abc", "bac", "bca"]})
>>> t.s.find("a")
┏━━━━━━━━━━━━━━━━━━━━┓
┃ StringFind(s, 'a') ┃
┡━━━━━━━━━━━━━━━━━━━━┩
│ int64 │
├────────────────────┤
│ 0 │
│ 1 │
│ 2 │
└────────────────────┘
>>> t.s.find("z")
┏━━━━━━━━━━━━━━━━━━━━┓
┃ StringFind(s, 'z') ┃
┡━━━━━━━━━━━━━━━━━━━━┩
│ int64 │
├────────────────────┤
│ -1 │
│ -1 │
│ -1 │
└────────────────────┘
find_in_set
Find the first occurrence of str_list
within a list of strings.
No string in str_list
can have a comma.
Parameters
Name | Type | Description | Default |
---|
str_list | Sequence[str] | Sequence of strings | required |
Returns
Name | Type | Description |
---|
| IntegerValue | Position of str_list in self . Returns -1 if self isn’t found or if self contains ',' . |
Examples
>>> import ibis
>>> table = ibis.table(dict(string_col="string"))
>>> result = table.string_col.find_in_set(["a", "b"])
fragment
Parse a URL and extract fragment identifier.
Examples
>>> import ibis
>>> url = ibis.literal("https://example.com:80/docs/#DOWNLOADING")
>>> result = url.fragment() # DOWNLOADING
Returns
host
Parse a URL and extract host.
Examples
>>> import ibis
>>> url = ibis.literal("https://user:pass@example.com:80/docs/books")
>>> result = url.host() # example.com
Returns
length
Compute the length of a string.
Returns
Name | Type | Description |
---|
| IntegerValue | The length of each string in the expression |
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"s": ["aaa", "a", "aa"]})
>>> t.s.length()
┏━━━━━━━━━━━━━━━━━┓
┃ StringLength(s) ┃
┡━━━━━━━━━━━━━━━━━┩
│ int32 │
├─────────────────┤
│ 3 │
│ 1 │
│ 2 │
└─────────────────┘
levenshtein
Return the Levenshtein distance between two strings.
Parameters
Name | Type | Description | Default |
---|
other | StringValue | String to compare to | required |
Returns
Name | Type | Description |
---|
| IntegerValue | The edit distance between the two strings |
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> s = ibis.literal("kitten")
>>> s.levenshtein("sitting")
┌───┐
│ 3 │
└───┘
lower
Convert string to all lowercase.
Returns
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"s": ["AAA", "a", "AA"]})
>>> t
┏━━━━━━━━┓
┃ s ┃
┡━━━━━━━━┩
│ string │
├────────┤
│ AAA │
│ a │
│ AA │
└────────┘
>>> t.s.lower()
┏━━━━━━━━━━━━━━┓
┃ Lowercase(s) ┃
┡━━━━━━━━━━━━━━┩
│ string │
├──────────────┤
│ aaa │
│ a │
│ aa │
└──────────────┘
lpad
Pad arg
by truncating on the right or padding on the left.
Parameters
Returns
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"s": ["abc", "def", "ghij"]})
>>> t.s.lpad(5, "-")
┏━━━━━━━━━━━━━━━━━┓
┃ LPad(s, 5, '-') ┃
┡━━━━━━━━━━━━━━━━━┩
│ string │
├─────────────────┤
│ --abc │
│ --def │
│ -ghij │
└─────────────────┘
lstrip
Remove whitespace from the left side of string.
Returns
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"s": ["\ta\t", "\nb\n", "\vc\t"]})
>>> t
┏━━━━━━━━┓
┃ s ┃
┡━━━━━━━━┩
│ string │
├────────┤
│ \ta\t │
│ \nb\n │
│ \vc\t │
└────────┘
>>> t.s.lstrip()
┏━━━━━━━━━━━┓
┃ LStrip(s) ┃
┡━━━━━━━━━━━┩
│ string │
├───────────┤
│ a\t │
│ b\n │
│ c\t │
└───────────┘
path
Parse a URL and extract path.
Examples
>>> import ibis
>>> url = ibis.literal(
... "https://example.com:80/docs/books/tutorial/index.html?name=networking"
... )
>>> result = url.path() # docs/books/tutorial/index.html
Returns
protocol
Parse a URL and extract protocol.
Examples
>>> import ibis
>>> url = ibis.literal("https://user:pass@example.com:80/docs/books")
>>> result = url.protocol() # https
Returns
query
Parse a URL and returns query string or query string parameter.
If key is passed, return the value of the query string parameter named.
If key is absent, return the query string.
Parameters
Name | Type | Description | Default |
---|
key | str | StringValue | None | Query component to extract | None |
Examples
>>> import ibis
>>> url = ibis.literal(
... "https://example.com:80/docs/books/tutorial/index.html?name=networking"
... )
>>> result = url.query() # name=networking
>>> query_name = url.query("name") # networking
Returns
re_extract(pattern, index)
Return the specified match at index
from a regex pattern
.
Parameters
Name | Type | Description | Default |
---|
pattern | str | StringValue | Regular expression pattern string | required |
index | int | ir.IntegerValue | The index of the match group to return. The behavior of this function follows the behavior of Python’s match objects : when index is zero and there’s a match, return the entire match, otherwise return the content of the index -th match group. | required |
Returns
Name | Type | Description |
---|
| StringValue | Extracted match or whole string if index is zero |
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"s": ["abc", "bac", "bca"]})
Extract a specific group
>>> t.s.re_extract(r"^(a)bc", 1)
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ RegexExtract(s, '^(a)bc', 1) ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ string │
├──────────────────────────────┤
│ a │
│ ~ │
│ ~ │
└──────────────────────────────┘
Extract the entire match
>>> t.s.re_extract(r"^(a)bc", 0)
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ RegexExtract(s, '^(a)bc', 0) ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ string │
├──────────────────────────────┤
│ abc │
│ ~ │
│ ~ │
└──────────────────────────────┘
re_replace
re_replace(pattern, replacement)
Replace all matches found by regex pattern
with replacement
.
Parameters
Name | Type | Description | Default |
---|
pattern | str | StringValue | Regular expression string | required |
replacement | str | StringValue | Replacement string or regular expression | required |
Returns
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"s": ["abc", "bac", "bca", "this has multi \t whitespace"]})
>>> s = t.s
Replace all “a”s that are at the beginning of the string with “b”:
>>> s.re_replace("^a", "b")
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ RegexReplace(s, '^a', 'b') ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ string │
├───────────────────────────────┤
│ bbc │
│ bac │
│ bca │
│ this has multi \t whitespace │
└───────────────────────────────┘
Double up any “a”s or “b”s, using capture groups and backreferences:
>>> s.re_replace("([ab])", r"\0\0")
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ RegexReplace(s, '()', '\\0\\0') ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ string │
├─────────────────────────────────────┤
│ aabbc │
│ bbaac │
│ bbcaa │
│ this haas multi \t whitespaace │
└─────────────────────────────────────┘
Normalize all whitespace to a single space:
>>> s.re_replace(r"\s+", " ")
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ RegexReplace(s, '\\s+', ' ') ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ string │
├──────────────────────────────┤
│ abc │
│ bac │
│ bca │
│ this has multi whitespace │
└──────────────────────────────┘
re_search
Return whether the values match pattern
.
Returns True
if the regex matches a string and False
otherwise.
Parameters
Name | Type | Description | Default |
---|
pattern | str | StringValue | Regular expression use for searching | required |
Returns
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"s": ["Ibis project", "GitHub"]})
>>> t.s.re_search(".+Hub")
┏━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ RegexSearch(s, '.+Hub') ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ boolean │
├─────────────────────────┤
│ False │
│ True │
└─────────────────────────┘
re_split
Split a string by a regular expression pattern
.
Parameters
Name | Type | Description | Default |
---|
pattern | str | StringValue | Regular expression string to split by | required |
Returns
Name | Type | Description |
---|
| ArrayValue | Array of strings from splitting by pattern |
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable(dict(s=["a.b", "b.....c", "c.........a", "def"]))
>>> t.s
┏━━━━━━━━━━━━━┓
┃ s ┃
┡━━━━━━━━━━━━━┩
│ string │
├─────────────┤
│ a.b │
│ b.....c │
│ c.........a │
│ def │
└─────────────┘
>>> t.s.re_split(r"\.+").name("splits")
┏━━━━━━━━━━━━━━━━━━━━━━┓
┃ splits ┃
┡━━━━━━━━━━━━━━━━━━━━━━┩
│ array<string> │
├──────────────────────┤
│ ['a', 'b'] │
│ ['b', 'c'] │
│ ['c', 'a'] │
│ ['def'] │
└──────────────────────┘
repeat
Repeat a string n
times.
Parameters
Returns
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"s": ["a", "bb", "c"]})
>>> t.s.repeat(5)
┏━━━━━━━━━━━━━━┓
┃ Repeat(s, 5) ┃
┡━━━━━━━━━━━━━━┩
│ string │
├──────────────┤
│ aaaaa │
│ bbbbbbbbbb │
│ ccccc │
└──────────────┘
replace
replace(pattern, replacement)
Replace each exact match of pattern
with replacement
.
Parameters
Name | Type | Description | Default |
---|
pattern | StringValue | String pattern | required |
replacement | StringValue | String replacement | required |
Returns
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"s": ["abc", "bac", "bca"]})
>>> t.s.replace("b", "z")
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ StringReplace(s, 'b', 'z') ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ string │
├────────────────────────────┤
│ azc │
│ zac │
│ zca │
└────────────────────────────┘
reverse
Reverse the characters of a string.
Returns
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"s": ["abc", "def", "ghi"]})
>>> t
┏━━━━━━━━┓
┃ s ┃
┡━━━━━━━━┩
│ string │
├────────┤
│ abc │
│ def │
│ ghi │
└────────┘
>>> t.s.reverse()
┏━━━━━━━━━━━━┓
┃ Reverse(s) ┃
┡━━━━━━━━━━━━┩
│ string │
├────────────┤
│ cba │
│ fed │
│ ihg │
└────────────┘
right
Return up to nchars
from the end of each string.
Parameters
Name | Type | Description | Default |
---|
nchars | int | ir.IntegerValue | Maximum number of characters to return | required |
Returns
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"s": ["abc", "defg", "hijlk"]})
>>> t.s.right(2)
┏━━━━━━━━━━━━━━━━┓
┃ StrRight(s, 2) ┃
┡━━━━━━━━━━━━━━━━┩
│ string │
├────────────────┤
│ bc │
│ fg │
│ lk │
└────────────────┘
rpad
Pad self
by truncating or padding on the right.
Parameters
Returns
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"s": ["abc", "def", "ghij"]})
>>> t.s.rpad(5, "-")
┏━━━━━━━━━━━━━━━━━┓
┃ RPad(s, 5, '-') ┃
┡━━━━━━━━━━━━━━━━━┩
│ string │
├─────────────────┤
│ abc-- │
│ def-- │
│ ghij- │
└─────────────────┘
rstrip
Remove whitespace from the right side of string.
Returns
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"s": ["\ta\t", "\nb\n", "\vc\t"]})
>>> t
┏━━━━━━━━┓
┃ s ┃
┡━━━━━━━━┩
│ string │
├────────┤
│ \ta\t │
│ \nb\n │
│ \vc\t │
└────────┘
>>> t.s.rstrip()
┏━━━━━━━━━━━┓
┃ RStrip(s) ┃
┡━━━━━━━━━━━┩
│ string │
├───────────┤
│ \ta │
│ \nb │
│ \vc │
└───────────┘
split
Split as string on delimiter
.
This API only works on backends with array support.
Parameters
Name | Type | Description | Default |
---|
delimiter | str | StringValue | Value to split by | required |
Returns
Name | Type | Description |
---|
| ArrayValue | The string split by delimiter |
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"col": ["a,b,c", "d,e", "f"]})
>>> t
┏━━━━━━━━┓
┃ col ┃
┡━━━━━━━━┩
│ string │
├────────┤
│ a,b,c │
│ d,e │
│ f │
└────────┘
>>> t.col.split(",")
┏━━━━━━━━━━━━━━━━━━━━━━━┓
┃ StringSplit(col, ',') ┃
┡━━━━━━━━━━━━━━━━━━━━━━━┩
│ array<string> │
├───────────────────────┤
│ ['a', 'b', ... +1] │
│ ['d', 'e'] │
│ ['f'] │
└───────────────────────┘
startswith
Determine whether self
starts with start
.
Parameters
Name | Type | Description | Default |
---|
start | str | StringValue | prefix to check for | required |
Returns
Name | Type | Description |
---|
| BooleanValue | Boolean indicating whether self starts with start |
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"s": ["Ibis project", "GitHub"]})
>>> t.s.startswith("Ibis")
┏━━━━━━━━━━━━━━━━━━━━━━━┓
┃ StartsWith(s, 'Ibis') ┃
┡━━━━━━━━━━━━━━━━━━━━━━━┩
│ boolean │
├───────────────────────┤
│ True │
│ False │
└───────────────────────┘
strip
Remove whitespace from left and right sides of a string.
Returns
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"s": ["\ta\t", "\nb\n", "\vc\t"]})
>>> t
┏━━━━━━━━┓
┃ s ┃
┡━━━━━━━━┩
│ string │
├────────┤
│ \ta\t │
│ \nb\n │
│ \vc\t │
└────────┘
>>> t.s.strip()
┏━━━━━━━━━━┓
┃ Strip(s) ┃
┡━━━━━━━━━━┩
│ string │
├──────────┤
│ a │
│ b │
│ c │
└──────────┘
substr
substr(start, length=None)
Extract a substring.
Parameters
Name | Type | Description | Default |
---|
start | int | ir.IntegerValue | First character to start splitting, indices start at 0 | required |
length | int | ir.IntegerValue | None | Maximum length of each substring. If not supplied, searches the entire string | None |
Returns
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"s": ["abc", "defg", "hijlk"]})
>>> t.s.substr(2)
┏━━━━━━━━━━━━━━━━━┓
┃ Substring(s, 2) ┃
┡━━━━━━━━━━━━━━━━━┩
│ string │
├─────────────────┤
│ c │
│ fg │
│ jlk │
└─────────────────┘
to_date
translate
translate(from_str, to_str)
Replace from_str
characters in self
characters in to_str
.
To avoid unexpected behavior, from_str
should be shorter than
to_str
.
Parameters
Name | Type | Description | Default |
---|
from_str | StringValue | Characters in arg to replace | required |
to_str | StringValue | Characters to use for replacement | required |
Returns
Examples
>>> import ibis
>>> table = ibis.table(dict(string_col="string"))
>>> result = table.string_col.translate("a", "b")
upper
Convert string to all uppercase.
Returns
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"s": ["aaa", "A", "aa"]})
>>> t
┏━━━━━━━━┓
┃ s ┃
┡━━━━━━━━┩
│ string │
├────────┤
│ aaa │
│ A │
│ aa │
└────────┘
>>> t.s.upper()
┏━━━━━━━━━━━━━━┓
┃ Uppercase(s) ┃
┡━━━━━━━━━━━━━━┩
│ string │
├──────────────┤
│ AAA │
│ A │
│ AA │
└──────────────┘
userinfo
Parse a URL and extract user info.
Examples
>>> import ibis
>>> url = ibis.literal("https://user:pass@example.com:80/docs/books")
>>> result = url.userinfo() # user:pass
Returns