background

The original track data is individual GPS points, and our common requirements are generally

  • To combine a series of points into a complete trajectory;
  • The whole track is divided into multiple segments according to the conditions (time, speed, stop point, etc.);
  • Obtain statistics of a trajectory, such as starting point, distance, time, etc.

Note: Aliyun Ganos provides more perfect trajectory model and analysis functions, but only PostGIS native functions are enough to meet our basic needs.

Use PostGIS functions

The PostGIS functions used are as follows

  • ST_MakePointM
  • ST_SetSRID
  • ST_MakeLine
  • ST_IsValidTrajectory
  • ST_NPoints
  • ST_Length
  • ST_M
  • ST_StartPoint
  • ST_EndPoint
  • ST_DumpPoints

The data source

T-drive exposes data sources

Location: Beijing, China, Frequency: 180 seconds, duration: 7 days, area size: 750 km2, Vehicle type: Taxi, Number of vehicles: 10,357

The data format is as follows

1200-02-02 15:36:08, 116.51172, 39.92123 1200-02-02 15:46:08, 116.51135, 39.93883 1200-02-02 15:56:08, 116.51627, 39.91034 meanings taxi id field, the date time, longitude, latitudeCopy the code

Raw data is stored

Add Kettle data to the database in batches. The table is as follows

drop table if exists demo.t_taxi_gps cascade;
create table demo.t_taxi_gps (tid int, ts timestamp, lng float, lat float);
Copy the code

Query sample data

select tg.* from demo.t_taxi_gps tg limit 2 ;
  tid  |         ts          |    lng    |   lat
-------+---------------------+-----------+----------
 10012 | 2008-02-02 13:33:43 | 116.46235 | 39.88151
 10012 | 2008-02-02 13:34:12 | 116.46232 | 39.88147
(2 rows)
Copy the code

Trajectory subsection

The business requirement is to segment the historical trajectory by vehicle ID+ date.

PostGIS does not provide the trajectory data type. LinestringM is used to simulate the data. Ideas as follows

  • Use the Geometry type and set SRID=4326 (WGS84 coordinate system).
  • Use the PointM type to save specific trackpoints, where M holds the Unix format of the timestamp.
  • Use LinestringM to hold a series of points that are constructed as trace data.

Create track table

drop table if exists demo.t_taxi_trajectory cascade;

Use the Geometry type and set SRID=4326
create table if not exists demo.t_taxi_trajectory
(
	tid int, 
	dt date, 
	traj geometry(linestringm, 4326),
	primary key (tid, dt)
);
Copy the code

Trajectory segmentation algorithm

The algorithm is simple, but there are a few caveats

  • According to the vehicle + timestamp, data deduplication is carried out in order to ensure that legitimate track data is finally obtained.
  • The Timestamp Without Time Zone type is used in the original data to save the Timestamp. However, when converting the Timestamp to Unix Time, ensure that the converted type contains the Time Zone information; otherwise, the default Time is UTC.
with
Remove redo and construct pointM
The time zone information is required for the conversion type; otherwise, the time will default to UTC
point_m as
(
	select distinct on (tid, ts) 
			gt.tid, gt.ts, gt.lng, gt.lat, 
			date_trunc('day', gt.ts)::date dt,
			st_setsrid( st_makepointm( gt.lng, gt.lat, extract(epoch from gt.ts::timestamp with time zone)), 4326) pt
	from demo.t_taxi_gps gt
	order by tid, ts
),
- Creating trajectory
-- array_agg with order by
traj_raw as
(
	select pm.tid, pm.dt,
		st_makeline( array_agg(pm.pt order by pm.ts )::geometry[] ) as traj 	
	from point_m pm 
	group by pm.tid, pm.dt
)
-- 
insert into demo.t_taxi_trajectory
	select tid, dt, traj from traj_raw tr
;
Copy the code

Example Query basic trace information

Query basic information about trajectories.

Note: We store LinestringM in the Geometry type. If we use ST_Length() directly to find the length of the track, the return value is in degrees instead of meters. So you need to cast the geography type and return the result in meters.

Convert to geography to calculate the distance, returning in meters
select tr.tid, tr.dt,
	st_npoints(tr.traj) as point_count,
	(st_length(tr.traj::geography)/1000) : :int as length_kms,
	to_timestamp( st_m( st_startpoint(tr.traj) ) ) ts_start, 
	to_timestamp( st_m( st_endpoint(tr.traj) ) ) ts_end,
	ST_IsValidTrajectory(tr.traj) as valid_flag
from demo.t_taxi_trajectory tr
where tr.tid  in (28)
	order by tr.tid, tr.dt
limit 3
;
 tid |     dt     | point_count | length_kms |        ts_start        |         ts_end         | valid_flag
-----+------------+-------------+------------+------------------------+------------------------+------------
  28 | 2008-02-02 |         387 |        190 | 2008-02-02 13:33:16+08 | 2008-02-02 23:57:24+08 | t
  28 | 2008-02-03 |         885 |        331 | 2008-02-03 00:00:45+08 | 2008-02-03 23:58:18+08 | t
  28 | 2008-02-04 |         930 |        353 | 2008-02-04 00:00:09+08 | 2008-02-04 23:59:46+08 | t
(3 rows)
Copy the code

Trajectory reduction

Data saved in the Trajectory model can be easily restored using ST_DumpPoints.

However, other attributes in the original GPS information (speed, direction, etc.) cannot be saved and restored.

-- Restore the trace point
with dump_pt as
(
	select (st_dumppoints(tr.traj)).geom as pt 
	from demo.t_taxi_trajectory tr
	where tr.tid=28 
		and tr.dt = '2008-02-02' 
)
select 
	st_x(pt) as lng, st_y(pt) as lat, to_timestamp(st_m(pt)) as ts
from dump_pt 
order by ts
limit 3
;
    lng    |   lat    |           ts
-----------+----------+------------------------| | | 2008-02-02 13:33:16 40.0012 + 116.40818 08 116.42549 40.00747 116.44973 40.00647 | | | 13:37:18 + 2008-02-02 08 2008-02-02 13:41:09+08 (3 rows)Copy the code