When should you use point geometries to represent spatial data?
Use points to represent single locations such as tweets, incidents, crimes, or any feature with only a coordinate.
When should you use line geometries?
Use lines (LineStrings) for linear features such as roads, rivers, or paths.
When should you use polygon geometries?
Use polygons for areas such as states, counties, buildings, or boundaries.
What geometry types does Beast support?
Point, LineString, Polygon, MultiPoint, MultiLineString, MultiPolygon, and GeometryCollection
How is big spatial data related to the four V’s of big data?
Volume: Spatial datasets include millions of features (e.g., states, roads, satellites).
Velocity: Data from IoT, autonomous vehicles, or sensors arrives continuously.
Variety: Many input formats (CSV, GPX, KML, GeoJSON, shapefiles, GeoTIFF).
Veracity: Spatial data quality varies across sources (government data, satellites, crowdsourcing
How does spatial partitioning help with query processing?
It divides data into balanced partitions, enabling faster range filters, joins, and load balancing. Beast uses R*-Grove for efficient high-utilization partitions.
Why is efficient spatial partitioning important in distributed systems?
Good partitioning ensures even workload distribution, reduces skew, and speeds up spatial operations
What challenges occur when parsing irregular spatial file formats?
Spatial formats like shapefiles, GeoJSON, and compressed block ZIP require:
Decompression
Detecting record boundaries
Handling binary vs text formats
Avoiding partial-record splitting during parallel load
Beast solves this with split-aware parallel parsing.
How does Beast handle parsing irregular or compressed formats in parallel?
For every split (except the first), it skips to the next compressed block boundary, starts decompressing, then skips to the next record boundary—ensuring each partition finishes whole records
What makes visualizing big spatial data challenging?
Large datasets are too big to draw at once, and single-level images lose quality when zoomed
How does Beast overcome visualization challenges?
Provides plotImage for single-level images.
Provides plotPyramid for multilevel tile-based maps (similar to web map zooming).
What is the difference between vector and raster data?
Vector data: Geometries such as points, lines, polygons; used for buildings, states, roads.
Raster data: Gridded pixel data (e.g., satellite images, temperature, vegetation), stored as tiles. Each tile is a 2D array of pixels with values such as Int or Float.
What operations can be applied to raster data?
Local pixel-wise operations, focal (neighborhood) operations, filtering pixels, flattening, rescaling, and raster-vector joins (e.g., RaptorJoin).
What does the raster data model look like in Beast?
A 2D array of pixels
Tile ID
Geolocation metadata
RasterRDD[T] holds all tiles.
Why is combining raster and vector data useful?
It enables analytics such as overlaying population (vector) on temperature (raster) or combining satellite data with administrative boundaries.