New big spatial data Flashcards

Question 1

Q

When should you use point geometries to represent spatial data?

Answer

A

Use points to represent single locations such as tweets, incidents, crimes, or any feature with only a coordinate.

Question 2

Q

When should you use line geometries?

Answer

A

Use lines (LineStrings) for linear features such as roads, rivers, or paths.

Question 3

Q

When should you use polygon geometries?

Answer

A

Use polygons for areas such as states, counties, buildings, or boundaries.

Question 4

Q

What geometry types does Beast support?

Answer

A

Point, LineString, Polygon, MultiPoint, MultiLineString, MultiPolygon, and GeometryCollection

Question 5

Q

How is big spatial data related to the four V’s of big data?

Answer

A

Volume: Spatial datasets include millions of features (e.g., states, roads, satellites).

Velocity: Data from IoT, autonomous vehicles, or sensors arrives continuously.

Variety: Many input formats (CSV, GPX, KML, GeoJSON, shapefiles, GeoTIFF).

Veracity: Spatial data quality varies across sources (government data, satellites, crowdsourcing

Question 6

Q

How does spatial partitioning help with query processing?

Answer

A

It divides data into balanced partitions, enabling faster range filters, joins, and load balancing. Beast uses R*-Grove for efficient high-utilization partitions.

Question 7

Q

Why is efficient spatial partitioning important in distributed systems?

Answer

A

Good partitioning ensures even workload distribution, reduces skew, and speeds up spatial operations

Question 8

Q

What challenges occur when parsing irregular spatial file formats?

Answer

A

Spatial formats like shapefiles, GeoJSON, and compressed block ZIP require:
Decompression
Detecting record boundaries
Handling binary vs text formats
Avoiding partial-record splitting during parallel load
Beast solves this with split-aware parallel parsing.

Question 9

Q

How does Beast handle parsing irregular or compressed formats in parallel?

Answer

A

For every split (except the first), it skips to the next compressed block boundary, starts decompressing, then skips to the next record boundary—ensuring each partition finishes whole records

Question 10

Q

What makes visualizing big spatial data challenging?

Answer

A

Large datasets are too big to draw at once, and single-level images lose quality when zoomed

Question 11

Q

How does Beast overcome visualization challenges?

Answer

A

Provides plotImage for single-level images.

Provides plotPyramid for multilevel tile-based maps (similar to web map zooming).

Question 12

Q

What is the difference between vector and raster data?

Answer

A

Vector data: Geometries such as points, lines, polygons; used for buildings, states, roads.

Raster data: Gridded pixel data (e.g., satellite images, temperature, vegetation), stored as tiles. Each tile is a 2D array of pixels with values such as Int or Float.

Question 13

Q

What operations can be applied to raster data?

Answer

A

Local pixel-wise operations, focal (neighborhood) operations, filtering pixels, flattening, rescaling, and raster-vector joins (e.g., RaptorJoin).

Question 14

Q

What does the raster data model look like in Beast?

Answer

A

A 2D array of pixels
Tile ID
Geolocation metadata
RasterRDD[T] holds all tiles.

Question 15

Q

Why is combining raster and vector data useful?

Answer

A

It enables analytics such as overlaying population (vector) on temperature (raster) or combining satellite data with administrative boundaries.

New big spatial data Flashcards

(15 cards)