Skip to content

Drone Swarm Search: The Coverage Environment

About

The Coverage Environment is our second training environment, building on the PettingZoo framework and accommodating both multi-agent and single-agent setups with reinforcement learning algorithms. In contrast to the Search Environment, which aims to locate survivors, the Coverage Environment focuses on efficiently maximizing the search spread over the most probable area in minimal time. This environment differs from the Search Environment in its static nature; it employs a Lagrangian particle model that integrates real tidal data to accurately represent the area of highest probability. This simulation utilizes the open-source library Opendrift. Below, you can find a visual representation of the environment. To explore the environment without an algorithm, execute the script basic_coverage.py.


Fig 1: Representation of the environment in the Coverage Environment.

Quick Start

Warning

The DSSE project requires Python version 3.10.5 or higher.

To install GDAL (a requirement for using OpenDrift), you may need to install the following packages:

sh
sudo apt install -y libgdal-dev gdal-bin

For Windows, Microsoft Visual C++ 14.0 or greater is required for building.

Tip

After instancing the environment class, the beginning of simulation might take a while depending of your internet conection.

Install

pip install DSSE[coverage]

Use

Click me to view the code basic_coverage.py ⧉
python
from DSSE import CoverageDroneSwarmSearch

env = CoverageDroneSwarmSearch(
    drone_amount=3,
    render_mode="human",
    disaster_position=(-24.04, -46.17),  # (lat, long)
    pre_render_time=10, # hours to simulate
)

opt = {
    "drones_positions": [(0, 10), (10, 10), (20, 10)],
}
obs, info = env.reset(options=opt)

step = 0
while env.agents:
    step += 1
    actions = {agent: env.action_space(agent).sample() for agent in env.agents}
    observations, rewards, terminations, truncations, infos = env.step(actions)

print(infos["drone0"])

General Info

Importfrom DSSE import CoverageDroneSwarmSearch
Action SpaceDiscrete (9)
Action Values[0, 1, 2, 3, 4, 5, 6, 7, 8]
Observation Space{droneN: ((x, y), probability_matrix)}

Action Space

ValueMeaning
0Move Left
1Move Right
2Move Up
3Move Down
4Diagonal Up Left
5Diagonal Up Right
6Diagonal Down Left
7Diagonal Down Right
8Do nothing

tip

We incorporated 8 actions in this environment to enable the use of agents trained here in the Search Environment as well.

Inputs

InputsPossible ValuesDefault Values
render_mode"ansi" or "human""ansi"
render_gridboolTrue
render_gradientboolTrue
timestep_limitint100
disaster_position(float, float)(-24.04, -46.17)
drone_amountint1
drone_speedint10
drone_probability_of_detectionfloat1.0
pre_render_timeint10
prob_matrix_pathstringNone
particle_amountint50,000
particle_radiusint800
num_particle_to_filter_as_noiseint1
start_timedatetimeNone
grid_cell_sizeint130
  • render_mode:

    • ansi: This mode presents no visualization and is intended to train the reinforcement learning algorithm.
    • human: This mode presents a visualization of the drones actively searching the target, as well as the visualization of the person moving according to the input vector.
  • render_grid: If set to True along with render_mode = "human", the visualization will be rendered with a grid. If set to False, there will be no grid when rendering.

  • render_gradient: If set to True along with render_mode = "human", the colors in the visualization will be interpolated according to the probability of the cell. Otherwise, the color of the cell will be solid according to the following values, considering the values of the matrix are normalized between 0 and 1: 1 > value >= 0.75 the cell will be green | 0.75 > value >= 0.25 the cell will be yellow | 0.25 > value the cell will be red.

  • timestep_limit: It's an integer that defines the length of an episode. This means that the timestep_limit is essentially the number of steps that can be done without resetting or ending the environment.

  • disaster_position: (float, float) parameter to specify the location of the event that let to PIW (Persons In Water), it receives a tuple of floats, representing the latitue and longitute of said location. default value is (-24.04, -46.17), a point near the coast of Guaruja, Brazil.

  • drone_amount: Specifies the number of drones to be used in the simulation. This integer parameter can be adjusted to simulate scenarios with different drone counts.

  • drone_speed: An integer parameter that sets the drones' speed in the simulation, measured in meters per second (m/s). Adjust this value to simulate drones operating at various speeds.

  • drone_probability_of_detection: This float parameter signifies the probability of a drone detecting an object of interest. Changing this value allows the user to simulate different detection probabilities.

  • pre_render_time: This int parameter specifies the amount of time (hours) to pre-render the simulation before starting. Adjusting this value lets the user control the pre-rendering time of the simulation.

  • prob_matrix_path: This string parameter allows the user to specify the path to file of a already simulated probability matrix. The file should be a .npy file containing a probability matrix. If this parameter is not specified, the environment will generate a new probability matrix.

  • particle_amount: This int parameter allows the user to customize the number of particles used in the Lagrangian particle model used to create the probability matrix.

  • particle_radius: This int parameter allows the user to customize the radius that the particles are randomly placed at the start of the Lagrangian particle model simulation.

  • num_particle_to_filter_as_noise: This int parameter allows the user to modify the number of particles in each cell that are filtered to zero. (e.g. if the number of value is 1, cells with only 1 particle in the end of simulation will be filtered to 0).

  • start_time: This datetime parameter allows the user to specify the start time of the simulation. If not specified, the simulation will start at the current time.

  • grid_cell_size: This int parameter allows the user to specify the size of the grid cells in meters. The default value is 130 meters.

Built in Functions

env.reset:

The env.reset() reinitializes the environment to its initial state. To customize the starting conditions, such as drone positions, you can pass an options dictionary to the method. Here’s how to structure this dictionary and use the reset() method:

python
opt = {
    "drones_positions": [(10, 5), (10, 10)],
}
observations, info = env.reset(options=opt)

Parameters in the options Dictionary:

  • drones_positions: Specifies the initial [(x, y), (x, y), ...] coordinates for each drone. Ensure this list contains an entry for each drone in the environment.

Default Behavior:

Without any arguments, env.reset() will place drones sequentially from left to right in adjacent cells. When there are no more available cells in a row, it moves to the next row and continues from left to right.

Return Values:

The method returns an observations dictionary containing observations for all drones, which provides insights into the environment's state immediately after the reset. The info dictionary contains additional information about the environment.

env.step:

The env.step() method defines the drone's next movement. It requires a dictionary input where each key is a drone's name and its corresponding value is the action to be taken. For instance, in an environment initialized with 10 drones, the method call would look like this:

python
env.step({
    'drone0': 2, 'drone1': 3, 'drone2': 2, 'drone3': 5, 'drone4': 1,
    'drone5': 0, 'drone6': 2, 'drone7': 5, 'drone8': 0, 'drone9': 1
})

Warning

Every drone listed in the dictionary must have an associated action. If any drone is omitted or if an action is not specified for a drone, the method will raise an error.

The method returns a tuple containing the following elements in order:

  • Observation: The new state of the environment after the step.
  • Reward: The immediate reward obtained after the action.
  • Termination: Indicates whether the episode has ended (e.g., find all castway, limit exceeded).
  • Truncation: Indicates whether the episode was truncated (e.g., through a timeout).
  • Info: A dictionary containing auxiliary diagnostic information.

Probability Matrix

The probability matrix is created using a Lagrangian particle simulation facilitated by the Opendrift library. In this process, particles are released at a disaster site and drift with water currents. Those reaching the coast are removed from the simulation. The matrix is constructed by recording the number of particles that reach each grid cell, thus encapsulating the data accumulated throughout the simulation. The final locations of these particles are documented to form the matrix.


Fig 2: Opendrift simulation over 40 hours, starting at coordinates (-24.04, -46.17).

As demonstrated in the above GIF, after conducting the Opendrift simulation, we ascertain the final positions of the particles and proceed to construct the probability matrix. This matrix is visually represented by the blue particles in the image below.


Fig 3: Final positions of particles from an Opendrift simulation.

Following the collection of particle data, we generate the probability matrix. Each cell in this matrix represents the likelihood of discovering a person at that particular location, as depicted in the subsequent image.


Fig 4: Final probability matrix.

  • Probability Matrix: The probability indicated in each cell reflects the likelihood of finding a person in that specific location.

Observation

The observation is a dictionary with all the drones as keys, identified by names such as drone0, drone1, etc. Each key is associated with a tuple that contains the drone's current position and its perception of the environment, represented as a probability matrix.

  • Tuple Structure: ((x_position, y_position), probability_matrix)
    • x_position, y_position: The current coordinates of the drone on the grid.
    • probability_matrix: A matrix representing the drone's view of the probability distribution of the target's location across the grid.

An output example can be seen below.

python
{
    'drone0': 
        ((5, 5), array([[0., 0., 0., ..., 0., 0., 0.],
                                        [0., 0., 0., ..., 0., 0., 0.],
                                        [0., 0., 0., ..., 0., 0., 0.],
                                        ...,
                                        [0., 0., 0., ..., 0., 0., 0.],
                                        [0., 0., 0., ..., 0., 0., 0.],
                                        [0., 0., 0., ..., 0., 0., 0.]])
        ),

    'drone1': 
        ((25, 5), array([[0., 0., 0., ..., 0., 0., 0.],
                                        [0., 0., 0., ..., 0., 0., 0.],
                                        [0., 0., 0., ..., 0., 0., 0.],
                                        ...,
                                        [0., 0., 0., ..., 0., 0., 0.],
                                        [0., 0., 0., ..., 0., 0., 0.],
                                        [0., 0., 0., ..., 0., 0., 0.]])
        ),
    'drone2': 
        ((45, 5), array([[0., 0., 0., ..., 0., 0., 0.],
                                        [0., 0., 0., ..., 0., 0., 0.],
                                        [0., 0., 0., ..., 0., 0., 0.],
                                        ...,
                                        [0., 0., 0., ..., 0., 0., 0.],
                                        [0., 0., 0., ..., 0., 0., 0.],
                                        [0., 0., 0., ..., 0., 0., 0.]])
        ),

       
       .................................
       
    'droneN': 
        ((33, 45), array([[0., 0., 0., ..., 0., 0., 0.],
                                        [0., 0., 0., ..., 0., 0., 0.],
                                        [0., 0., 0., ..., 0., 0., 0.],
                                        ...,
                                        [0., 0., 0., ..., 0., 0., 0.],
                                        [0., 0., 0., ..., 0., 0., 0.],
                                        [0., 0., 0., ..., 0., 0., 0.]])
        ),
        
}

Reward

The reward returns a dictionary with the drones names as keys and their respectful rewards as values. For example {'drone0': 1, 'drone1': 89.0, 'drone2': 1}

The rewards values goes as follows:

  • Default Action: Every action receives a baseline reward of -0.2.
  • Searching a Cell: The reward for searching a cell is proportional to the probability p of the cell being searched, denoted as 1 + (1 - Ts / Ts_limit) * p * n_cells, being n_cells the number of cells with probability greater than 0, Ts being the terminal timestep and Ts_limit the limit of timesteps.
  • Complete the searching: If all cells are searched, the reward is n_cells + n_cells * (1 - Ts / Ts_limit).

Termination & Truncation

The termination and truncation variables return a dictionary with all drones as keys and boolean as values. By default, these values are set to False and will switch to True under any of the following conditions:

  • Time Limit Exceeded: If the simulation's timestep exceeds the timestep_limit.
  • Done searching all cells: If the agents have searched all cells with probability > 0.

For example, the dictionary might look like this:

python
{'drone0': False, 'drone1': False, 'drone2': False}

Info

The Info is structured as a dictionary of dictionaries, where each drone, such as drone0, serves as a key. The associated value is another dictionary containing several key metrics:

  • is_completed: a boolean indicating whether the drone has searched all grid cells. It starts as False and changes to True once the drone has completed its search.
  • coverage_rate: the percentage of the grid that has been covered by the drone.
  • repeated_coverage: the percentage of the grid that has been covered more than once, indicating overlap in search areas.
  • accumulated_pos: The accumulated Probability of Sucess (POS) of the SAR mission, this serves as a way to quantify the chance of finding all SAR targets within a mission.

The info section serves as an indicator of the progress of the search operation.

For example, here is how the dictionary appears before any drone has completed its search:

python
{'drone0': {'is_completed': False, 'coverage_rate': 0.5693877551020409, 'repeated_coverage': 0.0010204081632653062, 'acumulated_pos': 0}}

After a drone successfully locates the person, the dictionary updates to reflect this:

python
{'drone0': {'is_completed': True, 'coverage_rate': 100, 'repeated_coverage': 2.912397984939490308, 'acumulated_pos': 1.0}}

This setup allows users to continuously monitor and assess the effectiveness of the search operation during the simulation.

env.get_agents:

The env.get_agents() method will return a list of all the possible agents (drones) currently initialized in the simulation, you can use it to confirm that all the drones exist in the environment. For example, in an environment configured with 10 drones, the method would return:

python
['drone0', 'drone1', 'drone2', 'drone3', 'drone4', 'drone5', 'drone6', 'drone7', 'drone8', 'drone9']

env.save_matrix:

The env.save_matrix() method enables saving the probability matrix as a .npy file. To use this method, you must provide a string parameter that specifies the file path where the matrix should be saved. For example:

python
env.save_matrix("path/to/save/matrix.npy")

After saving, you can load this matrix in a newly created environment using the prob_matrix_path parameter:

python
env = CoverageDroneSwarmSearch(
    drone_amount=3,
    render_mode="human",
    disaster_position=(-24.04, -46.17),  # (lat, long)
    pre_render_time=10, # hours to simulate
    prob_matrix_path="path/to/save/matrix.npy"
)

Warning

Ensure that the disaster_position and pre_render_time parameters are the same as those used to generate the matrix to maintain consistency in the simulation conditions.

Tip

The env.save_matrix() method is not only convenient for saving the probability matrix post-simulation but also essential for reusing it in subsequent simulations without needing to regenerate it. It facilitates varying simulation parameters, such as the number of drones, their speed, or timestep, for different experimental setups.

env.close:

env.close() will simply close the render window. Not a necessary function but may be used.

Stay Updated

We appreciate your patience and interest in our work. If you have any questions or need immediate assistance regarding our Coverage Environment, please do not hesitate to contact us via our GitHub Issues page.

License

This documentation is licensed under the terms of the MIT License. See the LICENSE file for more details.

Published under the MIT License.