cvkit.pose_estimation package

Subpackages

Submodules

cvkit.pose_estimation.config module

class AnnotationConfig(name, data_dictionary)[source]

Bases: object

Stores information about data files for each view of the project.

Parameters:: data_dictionary (dict) – dictionary containing annotation meta-data.

annotation_file: Path of the annotation data file.

annotation_file_flavor: Flavor of the data file. Refer :py:attr:cvkit.pose_estimation.data_readers.datastore_interface.DataStoreInterface.FLAVOR

video_file: Path to the video file

video_reader: Flavor of the video file. Refer cvkit.video_readers.video_reader_interface.BaseVideoReaderInterface.FLAVOR

view: Name of the camera

class CameraViews(data_dictionary, framerate)[source]

Bases: object

Stores metadata of the camera setup.

Parameters:

data_dictionary (dict) – Dictionary containing the metadata for the recording setup.
framerate (float) – Project level Framerate. (Assumes equal framerate for all views)

axes: Contains 2D x_max, y_max, and origin. This can be used to create a coordinate system for the reconstructed data.

dlt_coefficients: DLT co-efficients generated by the EasyWand package.

f_px: Focal length in pixels

pos: Extrinsic data: Position of the camera in world coordinates.

principal_point: Principal point of the camera lens.

resolution: Intrinsic Data: Resolution of the captured video.

DEFAULT_THRESHOLD = 0.6: Default likelihood threshold value

class PoseEstimationConfig(path)[source]

Bases: object

This class is used to read and write pose estimation configuration files. It contains basic information about the experiments such as the body parts of the tracked subject, their connectivity, camera setup, data folder, data files, reconstruction parameters, and so on.

# Project Name
name: unnamed_project

# Valid path to output folder
output_folder: ''
# List of body parts
body_parts:
  - part_1
  - part_2
  - part_3
  - part_4
# List of lists defining body part connections
skeleton:
  - - part_1
    - part_3
  - - part_2
    - part_3
  - - part_3
    - part_4
# List of colors (R,G,B). If enough colors are not provided, others will be randomly generated.
colors: #Optional
    - [ 230, 25, 75 ] # Color for part_1
    - [ 60, 180, 75 ] # Color for part_2
    - [ 255, 225, 25 ] # Color for part_3

# Reconstruction configuration parameters
Reconstruction:

    # Project level framerate. We currently only support videos with equal      #
    # framerate                                                                 #
    framerate: 60

    # Unscaled length of the x-axis
    x_len: <length>

    # Unscaled length of the y-axis
    y_len: <length>

    # Reconstruction algorithm, accepts 'default' or 'auto_subset'              #
    # default: Reconstructs if likelihood is higher than the threshold for all  #
    # views.                                                                    #
    # auto_subset: Automatically creates a subset of 'accurate' viewpoints      #
    # based on the threshold value. The reconstruction is performed if the      #
    # number of viewpoints is more than 2.                                      #
    reconstruction_algorithm: default # Optional

    # Rotation Matrix to align 3D reconstructed data. It will be multiplied     #
    # after initial reconstruction.                                             #
    rotation_matrix: # Optional
    - [ 1.0, 0.0, 0.0 ]
    - [ 0.0, 1.0, 0.0 ]
    - [ 0.0, 0.0, 1.0 ]

    # The desired scale factor for converting reconstructed data's units.       #
    # Example: If reconstructed data is in meters, scale can be set to 1000 to  #
    # to generate data in millimeters.                                          #
    scale: 1.0

    # Scale factor that can be computed through update_alignment_matrices.       #
    # This uses pre-known distances on the arena to adjust the desired scaling   #
    # factor for mitigating reconstruction noise.                                #
    computed_scale: [1.0,1.0,1.0] # Optional, defaults to scale

    # Project level likelihood threshold value.
    threshold: 0.75

    # Static translation vector. Added after scaling.                           #
    # Used for moving origin to desired location.                               #
    # Note: The translation vector has to be scaled before adding               #
    translation_vector: [ 0, 0, 0 ]

    # Axis Alignment vector. Used to flip targeted axis.                        #
    # Only accepts either 1 or -1, indicating whether the corresponding axis    #
    # will be flipped.                                                          #
    # [-1,1,-1] Flips x and z axes.                                             #
    axis_rotation_3D: [1,1, 1]
annotation:
    VIEW_NAME_1:
        annotation_file: '' # Path to datastore containing pose data for the view
        annotation_file_flavor: <flavor> # DataStoreInterface Flavor
        video_file: '' # Path to the video file for the view
        video_reader: <flavor> # BaseVideoReaderInterface Flavor
        # Corresponding Camera ID. Use None for importing video data not        #
        # corresponding to any cameras.                                         #
        view: None

    # Repeat for each annotated views

views:
    Cam_id_1:
        axes:
            origin: [-1, -1 ] # 2D position of the origin for this camera view
            x_max: [ -1, -1 ] # 2D position of the x max location for this camera view
            y_max: [ -1, -1 ] # 2D position of the y max location for this camera view
        dlt_coefficients: <list of 12 numbers representing the DLT co-efficients for this camera view>
        f_px: -1 # Focal length of the camera in px
        pos: [ ] # Position of the camera in world coordinates.
        principal_point: [ ] # Principal point of the camera.
        resolution: [ ] # Resolution of the captured frames.

    # Repeat for each camera.

Parameters:: path (str) – The path of the yaml file

annotation_views: A dictionary mapping views to its corresponding data files - AnnotationConfig.

axis_rotation_3D: 3 dimensional list where all the elements are either 1 or -1. This can be used to flip desired axis.

body_parts: List of body parts

colors: Custom colors for each body part. Colors are randomly generated if not explicitly provided.

computed_scale: Computed scale factor based on pre-known distances to reduce reconstruction noise

framerate: Project level framerate

num_parts: Number of body parts

output_folder: Path for the output directory

project_name: The name of the Project

reconstruction_algorithm: Reconstruction algorithm. Auto-Subset: Picks at least 2 views based on likelihood values. Regular: Only reconstructs if all views have likelihood higher than the threshold.

rotation_matrix: 3x3 Rotation matrix for aligning reconstructed data.

scale: Project level scale factor for reconstructed data.

skeleton: Defines connectivity among the body parts.

threshold: Threshold value for the project.

translation_vector: Fixed 3-D translational vector for reconstructed data.

views: A dictionary mapping view names to camera information - CameraViews.

save_config(path, data_dict)[source]

Saves given data dictionary to Yaml file

Parameters:

path (str) – Output File Path
data_dict (dict) – dictionary containing the project configuration

cvkit.pose_estimation.skeleton module

class Part(arr, name, likelihood)[source]

Bases: ndarray

Represents a body part of the tracked subject.

#2D Part pointing to l_eye with 0.7 likelihood.
part = Part([100,200],'l_eye',0.7)
#3D Part pointing to l_eye with 0.5 likelihood.
part_l_eye_3d = Part([100,100,50],'l_eye',0.5)
#3D Part pointing to r_eye with 0.5 likelihood.
part_r_eye_3d = Part([100,100,50],'r_eye',0.5)
#3D Part pointing to eye_mid with 0.5 likelihood.
part_eye_mid_3d = (part_l_eye_3d + part_r_eye_3d)/2

Parameters:

arr – Array of N values defining the position in N-dimensional space
name – Name of the body part
likelihood – A value indicating confidence in the accuracy of the position defined by arr

Param_type arr:

list,:class:’numpy.ndarray’

Param_type name:

string

Param_type likelihood:

float

magnitude()[source]

Computes the magnitude of the Part.

Returns:: Magnitude of the Part.
Return type:: float

numpy()[source]: Creates a numpy array from Part :return: An N-Dimensional numpy array :rtype: :class:’numpy.ndarray’

class Skeleton(body_parts: list, part_map: dict | None = None, likelihood_map: dict | None = None, behaviour: list = [], dims=3)[source]

Bases: object

This class represents the skeleton of the tracked subject.

body_parts = ['snout','headBase']
data_map_1 = {'snout':[200,300,50],'headBase':[200,270,100]}
likelihood_map_1 = {'snout':0.7,'headBase':0.8}
current_behaviours = ['rearing']

# Skeleton at t = 0
# (list of bodyparts, data dictionary, likelihood dictionary, behaviour list (default empty), dimensions (default 3)
# For 2D skeleton set dims = 2
skeleton_1 = Skeleton(body_parts,data_map_1,likelihood_map_1,current_behaviours)

data_map_2 = {'snout':[100,300,50],'headBase':[100,270,100]}
likelihood_map_2 = {'snout':0.7,'headBase':0.8}

# Skeleton at t = 1
skeleton_2 = Skeleton(body_parts,data_map_2,likelihood_map_2)

# Displacement
displacement = skeleton_2 - skeleton_1
print(displacement['snout'],displacement['headBase'])

#Head Direction
head_direction = skeleton_1['snout'] - skeleton_1['headBase']

#Support broadcast operations
skeleton_1 = skeleton_1 + [10,20,-5]    # non-uniform translation
skeleton_1 = skeleton_1 + 5             # uniform translation
skeleton_1 = skeleton_1 * 2             # uniform scaling
skeleton_1 = skeleton_1 * [0.5,1,0.5]   # non-uniform scaling

#Supports elementwise operations
skeleton_3 = skeleton_1 + skeleton_2
skeleton_3 = skeleton_1 * skeleton_2

#Normalize skeleton between 0 and 1.0
min_coordinates = [0,0,0] # Define minimum coordinate values
max_coordinates = [1000,1000,500] # Define maximum coordinate values
skeleton_1 = skeleton_1.normalize(min_coordinates,max_coordinates)

Parameters:

body_parts (list[str]) – list of body parts
part_map (dict) – A dictionary where the key is body part and value is its corresponding n-dimensional data.
likelihood_map (dict) – A dictionary where the key is body part and value is its corresponding likelihood data.
behaviour (list[str]) – list of labels defining the behaviour of the subject at current frame.
dims (int) – Dimension of underlying data.

normalize(max_lim, min_lim)[source]

Normalizes the skeleton so that the values range from 0.0 to 1.0

Parameters:

max_lim (list[float]) – The maximum limit of the coordinate system. n-dimensional list of coordinates.
min_lim (list[float]) – The minimum limit of the coordinate system. n-dimensional list of coordinates.

Returns:

Normalized Skeleton Object

Return type:

Skeleton

cvkit.pose_estimation.utils module

compute_distance_matrix(skeleton, threshold=0.6)[source]

Generates nxn Euclidean distance matrix for given skeleton where n = number of body parts.

Parameters:

skeleton – Input skeleton
threshold – Threshold for considering a body part as valid

Returns:

nxn numpy array containing Euclidean distance among all body parts.

get_spherical_coordinates(v1, is_degrees=True)[source]

Computes theta and phi spherical coordinates for input 3D vector

param v1:: Input Vector
param is_degrees:: Interprets input data as degrees or radians
return:: [theta,phi] polar coordinates

magnitude(vector)[source]

Computes magnitude of the vector

Parameters:: vector – Input Vector
Returns:: Frobenius norm of the vector

normalize_vector(vector)[source]

Normalized input vector.

Parameters:: vector – input vector
Returns:: normalized input vector

rotate(vector, rotation, scale=1.0, is_inv=False, axis_alignment_vector=None)[source]

Rotates a vector with rotation matrix followed by multiplying with axis alignment vector, followed by linear scaling.: If is_inv is set, the opposite operation is performed. First linear de-scaling, followed by axis alignment and rotation. Note: Although the function computes scale inverse, it does not compute rotation inverse.

Parameters:

vector – The vector to be rotated
rotation – Rotation Matrix
scale – Scaling Factor (accepts numpy array defining separate scaling factor for each axis)
is_inv – Flag for deciding flow of operations (‘rotate → axis alignment → scale’ or ‘de-scale → axis alignment → rotate’)

Returns:

rotated, scaled, and aligned vector

spherical_angle_difference(v1, v2, is_abs=True)[source]

Calculates shifted difference (v1-v2) between two spherical coordinate vectors.

Parameters:

v1 (numpy.ndarray) – Target input vector
v2 (numpy.ndarray) – Source input vector
is_abs (bool) – controls whether the difference is absolute

Returns:

shifted spherical angle difference between two vectors

Return type:

np.ndarray

cvkit.pose_estimation package

Subpackages

Submodules

cvkit.pose_estimation.config module

cvkit.pose_estimation.skeleton module

cvkit.pose_estimation.utils module

Module contents