helikite

Submodules

Attributes

__version__

__appname__

__description__

Classes

Cleaner

Level 0 processor for synchronizing timestamps across instruments and merging their data into a unified structure.

Package Contents

class helikite.Cleaner(output_schema: helikite.classes.output_schemas.OutputSchema, input_folder: str | pathlib.Path, flight_date: datetime.date, instruments: list[helikite.instruments.base.Instrument] | None = None, reference_instrument: helikite.instruments.base.Instrument | None = None, reference_instrument_shift: str | None = None, flight: str | None = None, time_takeoff: datetime.datetime | None = None, time_landing: datetime.datetime | None = None, time_offset: datetime.time = datetime.time(0, 0), interactive: bool = True)

Bases: helikite.classes.base.BaseProcessor

Level 0 processor for synchronizing timestamps across instruments and merging their data into a unified structure.

property level: helikite.classes.output_schemas.Level

Processing level identifier.

_instruments: list[helikite.instruments.base.Instrument] = []
input_folder: str
flight = None
flight_date: datetime.date
time_takeoff: datetime.datetime | None = None
time_landing: datetime.datetime | None = None
time_offset: datetime.time
pressure_column: str = 'pressure'
master_df: pandas.DataFrame | None = None
housekeeping_df: pandas.DataFrame | None = None
_reference_instrument: helikite.instruments.base.Instrument = None
_reference_instrument_shift: str | None = None
property df: pandas.DataFrame | None

Return the current state of dataframe.

_data_state_info() list[str]
set_pressure_column(column_name_override: str | None = None) None

Set the pressure column for each instrument’s dataframe

set_time_as_index() None

Set the time column as the index for each instrument dataframe

data_corrections(start_altitude: float = None, start_pressure: float = None, start_temperature: float = None) None

Apply instrument-specific correction routines.

plot_pressure() None

Creates a plot with the pressure measurement of each instrument

Assumes the pressure column has been set for each instrument

plot_time_sync(save_path: str | pathlib.Path, skip: list[helikite.instruments.base.Instrument])

Visualize pressure alignment after time synchronization.

remove_duplicates() None

Remove duplicate rows from each instrument based on time index, and clear repeated values in ‘msems_scan_’, ‘msems_inverted_’ columns, and specific ‘mcda_*’ columns, keeping only the first instance.

merge_instruments(tolerance_seconds: int = 0, remove_duplicates: bool = True) None

Merges all the dataframes from the instruments into one dataframe.

All columns from all instruments are included in the merged dataframe, with unique prefixes to avoid column name collisions.

Parameters:
  • tolerance_seconds (int) – The tolerance in seconds for merging dataframes.

  • remove_duplicates (bool) – If True, removes duplicate times and keeps the first result.

export_data(filepath: str | pathlib.Path | None = None) None

Export all data columns from all instruments to local files

The function will export a CSV and a Parquet file with all columns from all instruments. The files will be saved in the current working directory unless a filename is provided.

The Parquet file will include the metadata from the class.

_apply_rolling_window_to_pressure(instrument, window_size: int = 20)

Apply rolling window to the pressure measurements of instrument

Then plot the pressure measurements with the rolling window applied

define_flight_times()

Creates a plot to select the start and end of the flight

Uses the pressure measurements of the reference instrument to select the start and end of the flight. The user can click on the plot to select the points.

pressure_based_time_synchronization(max_lag: float = np.inf)

Time-synchronizes instruments by maximizing cross-correlation between the pressure data of the reference instrument and the other instruments. Runs multiple iterations using a coarse-to-fine scheme, starting with a coarse adjustment and refining until the final lags are found.

Parameters:

max_lag (int, optional) – Maximum allowed time lag in seconds used at the first (coarsest) level. If None, it defaults to half of the flight duration.

_get_best_instrument_lags(df_pressure, lags: numpy.ndarray) dict[helikite.instruments.base.Instrument, numbers.Number]
correct_time_and_pressure(max_lag=180, walk_time_seconds: int | None = None, apply_rolling_window_to: list[helikite.instruments.base.Instrument] = [], rolling_window_size: int = constants.ROLLING_WINDOW_DEFAULT_SIZE, reference_pressure_thresholds: tuple[float, float] | None = None, detrend_pressure_on: list[helikite.instruments.base.Instrument] = [], offsets: list[tuple[helikite.instruments.base.Instrument, int]] = [], match_adjustment_with: list[tuple[helikite.instruments.base.Instrument, helikite.instruments.base.Instrument]] = [])

Correct time and pressure for each instrument based on time lag.

Parameters:
  • max_lag (int) – The maximum time lag to consider for cross-correlation.

  • walk_time_seconds (int) – The time in seconds to walk the pressure data to match the reference instrument.

  • apply_rolling_window_to (list[Instrument]) – A list of instruments to apply a rolling window to the pressure data.

  • rolling_window_size (int) – The size of the rolling window to apply to the pressure data.

  • reference_pressure_thresholds (tuple[float, float]) – A tuple with two values (low, high) to apply a threshold to the reference instrument’s pressure data.

  • detrend_pressure_on (list[Instrument]) – A list of instruments to detrend the pressure data.

  • offsets (list[tuple[Instrument, int]]) – A list of tuples with an instrument and an offset in seconds to apply to the time index.

  • match_adjustment_with (dict[Instrument, list[Instrument]]) – A list of tuples with two instruments, in order to be able to to match the same time adjustment. This can be used, for example, if an instrument does not have a pressure column, and as such, can use the time adjustment from another instrument. The first instrument is the one that has the index adjustment, and the second instrument is the one that will be adjusted.

_build_df_pressure()
shift_msems_columns_by_90s()

Shift all ‘msems_inverted_’ and ‘msems_scan_’ columns by 90 seconds in time.

fill_missing_timestamps(instrument: helikite.instruments.base.Instrument, freq: str = '1S', fill_method: str | None = None)

Reindex the DataFrame of the instrument to fill in missing timestamps at the specified frequency. Optionally forward- or backward-fill missing values. Prints the number of timestamps added.

Parameters:
  • instrument (Instrument) – The input DataFrame with a DateTimeIndex.

  • freq (str) – The desired frequency for the DateTimeIndex (e.g., “1S” for 1 second).

  • fill_method (str or None) – Method to fill missing values: “ffill”, “bfill”, or None (default: None).

static detect_instruments(output_schema: helikite.classes.output_schemas.OutputSchema, input_folder: str | pathlib.Path) list[helikite.instruments.base.Instrument]

Automatically detect instruments from the files available in the input folder.

static choose_reference_instrument(output_schema: helikite.classes.output_schemas.OutputSchema, instruments: list[helikite.instruments.base.Instrument]) helikite.instruments.base.Instrument | None

Select a reference instrument for synchronization.

Parameters:
  • output_schema – Schema containing reference instrument candidates.

  • instruments – Detected instruments.

Returns:

Selected reference instrument.

Raises:

ValueError – If no suitable instrument is found.

classmethod get_expected_columns(output_schema: helikite.classes.output_schemas.OutputSchema, with_dtype: bool) list[str] | dict[str, str]

Generate expected dataframe columns at level 0.

Parameters:
  • output_schema – Schema containing campaign instruments.

  • with_dtype – Whether to include dtype mapping.

Returns:

List of column names or dict of column-to-dtype.

helikite.__version__ = '1.1.3'
helikite.__appname__ = 'helikite-data-processing'
helikite.__description__ = 'Library to generate quicklooks and data quality checks on Helikite campaigns'