Semantic Layer
A SemanticLayer
object in Relta represents the metrics that users can ask questions about per DataSource
.
Structure
A metric can be thought of a SQL View on the tables of the DataSource
. A definition of a metric contains dimensions, which represent columns of the view, and measures, which represents the aggregate functions that can be applied to the dimensions.
An example of a metric is:
{
"name": "carrier_route_analysis",
"description": "This metric provides the count of routes per carrier and identifies the most popular route for each carrier.",
"datasource": "db",
"dimensions": [
{
"name": "carrier",
"description": "The code of the carrier operating the flight.",
"categories": null,
"dtype": "VARCHAR"
},
{
"name": "carrier_name",
"description": "The full name of the carrier.",
"categories": null,
"dtype": "VARCHAR"
},
{
"name": "origin",
"description": "The origin airport code for the flight.",
"categories": null,
"dtype": "VARCHAR"
},
{
"name": "destination",
"description": "The destination airport code for the flight.",
"categories": null,
"dtype": "VARCHAR"
}
],
"measures": [
{
"name": "route_count",
"description": "The total number of routes flown by the carrier.",
"expr": "COUNT(DISTINCT origin || '-' || destination)"
}
],
"sample_questions": [
"Give me the full names of the top 5 carriers that flew the most routes and what their most popular route is."
],
"sql_to_underlying_datasource": "SELECT f.carrier, c.name AS carrier_name, f.origin, f.destination FROM public.flights f JOIN public.carriers c ON f.carrier = c.code"
}
The sql_to_underlying_datasource
field is the SELECT
statement that is in a CREATE VIEW
statement and can contain arbitrary DDL.
The SQL Agent is given the SemanticLayer
which contains a list of metrics, and the SQL Agent only uses dimensions and measures that are defined in the SemanticLayer
.
An example of SQL generated from that metric is:
-- What airline flies the least routes?
SELECT carrier_name,
COUNT(DISTINCT origin || '-' || destination) as route_count
FROM common_routes
GROUP BY carrier_name
ORDER BY route_count ASC LIMIT 1;
Observe that the SQL Agent only uses the dimensions and measures that are defined in the metric, and the metric itself is the view that we pull data from. In fact, the SQL Agent does not have access to the original tables or columns, nor even the create view statement to generate a view -- it only has the dimensions and measures for the metric. So the SQL Agent cannot work with any data not declared in the metric.
Usage
A SemanticLayer
object is automatically created when a DataSource
object is created (either by getting a DataSource
from disk or creating a new one). It is accessible from the semantic_layer
property of a DataSource
.
By default, the SemanticLayer
automatically loads in metrics that have been previously defined in it's directory, which is by default .relta/semantic_layer/{datasource_name}
.
import relta
rc = relta.Client()
source = rc.get_datasource("flights")
print(source.semantic_layer.metrics) # {name: "most_frequent_airlines_by_airport", ...}
To generate metrics from scratch for a semantic layer, you can use the propose
method. We cover how to do this using the CLI in the Getting Started but present it using library code here. Observe that we are only passing in natural language questions, and optionally business context, to the propose
method.
import relta
rc = relta.Client()
source = rc.create_datasource(os.environ["PG_AIRPORTS_DSN"], "airports")
source.semantic_layer.propose([
"Which airport has the longest security line?",
"Which airports have CLEAR?",
"Which airport should I go to if I'm flying to LA from NY, if I want to avoid cost?"
])
If you want to edit a metric, you can do it programmatically or by editing the JSON file.
Refining Metrics
If a user gives feedback on a metric, the relevant Response
is added to the feedback_responses
list of the SemanticLayer
, which you can then call refine
on to suggest updates to the Semantic Layer:
import relta
rc = relta.Client()
source = rc.get_datasource("airports")
chat = rc.create_chat(source)
resp = chat.prompt("which airport is the biggest?")
resp.feedback("negative", "please add support for airport size and passenger volume")
source.semantic_layer.refine(pr=True)
pr=True
will automatically raise a PR with the changes, given your Configuration includes the optional GITHUB_*
variables.
API Reference
Source code in src/relta/semantic/semantic_layer.py
28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 |
|
copy(source, dump=True)
Copy the semantic layer from another DataSource.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
source |
DataSource
|
|
required |
# |
from_path (Optional[Union[str, Path]]
|
Path to load the semantic layer from, ignoring |
required |
dump |
bool
|
Whether to dump the semantic layer to it's path. Defaults to True. |
True
|
Source code in src/relta/semantic/semantic_layer.py
dump(clear=True, path=None)
Dumps the semantic layer, accepting any updates made to the semantic layer.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
clear |
bool
|
Delete all JSON/YAML files in the path for this layer. Defaults to True. See |
True
|
path |
Optional[Union[str, Path]]
|
Path to dump the semantic layer. If None, uses |
None
|
Source code in src/relta/semantic/semantic_layer.py
dumps(mode='yaml', **kwargs)
Dumps the metrics and examples to a JSON or YAML string. JSON is typically used for feeding into an agent and YAML for display.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
**kwargs |
Keyword arguments to pass to |
{}
|
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
JSON representation of the semantic layer (metrics and examples). |
Source code in src/relta/semantic/semantic_layer.py
load(path=None, json=True, metrics_to_load=None)
Load semantic layer.
Changes to the metrics are not persisted on disk. Use .dump()
to persist them.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path |
Optional[Union[str, Path]]
|
Path to load the semantic layer. If None, uses |
None
|
json |
bool
|
Whether to additionally load the semantic layer from deprecated JSON files.
If a metric exists in both JSON and YAML files by |
True
|
metrics_to_load |
Optional[list[str]]
|
List of metric names to load. If None, loads all metrics. Defaults to None. |
None
|
Source code in src/relta/semantic/semantic_layer.py
propose(queries, context=None)
Proposes a new semantic layer for the given datasource and natural language queries.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
queries |
list[str]
|
A list of natural language queries that the semantic layer should answer. |
required |
context |
Optional[str]
|
Extra information about the datasource. Defaults to None. |
None
|
Source code in src/relta/semantic/semantic_layer.py
refine(pr=False)
Refines the semantic layer based on the feedback and creates a PR with the changes. By default, sets the refined metrics, but does not persist on disk -- see dump()
to persist.
If pr=True
, attempts to create a PR on the configured GitHub repo (see Configuration
) after setting the updating the metrics in the in-memory semantic layer.
If it is successful, returns the URL of the PR. Else, returns None
.
Source code in src/relta/semantic/semantic_layer.py
166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 |
|
Dimension
Bases: BaseModel
A dimension for a metric.
This is similar to a dimension in LookML -- it can be used to group or filter data in a metric. For example, a dimension could be a column in a table or a calculation based on columns in a table.
Source code in src/relta/semantic/base.py
ExampleCollection
Measure
Bases: BaseModel
A measure for a metric.
This is similar to a measure in LookML -- it is an aggregate operation on some dimensions. For example, a measure could be the sum of a column or the average of columnA * columnB.