helikite
Submodules
Attributes
Classes
Level 0 processor for synchronizing timestamps across instruments and merging their data into a unified structure. |
Package Contents
- class helikite.Cleaner(output_schema: helikite.classes.output_schemas.OutputSchema, input_folder: str | pathlib.Path, flight_date: datetime.date, instruments: list[helikite.instruments.base.Instrument] | None = None, reference_instrument: helikite.instruments.base.Instrument | None = None, reference_instrument_shift: str | None = None, flight: str | None = None, time_takeoff: datetime.datetime | None = None, time_landing: datetime.datetime | None = None, time_offset: datetime.time = datetime.time(0, 0), interactive: bool = True)
Bases:
helikite.classes.base.BaseProcessorLevel 0 processor for synchronizing timestamps across instruments and merging their data into a unified structure.
- property level: helikite.classes.output_schemas.Level
Processing level identifier.
- _instruments: list[helikite.instruments.base.Instrument] = []
- input_folder: str
- flight = None
- flight_date: datetime.date
- time_takeoff: datetime.datetime | None = None
- time_landing: datetime.datetime | None = None
- time_offset: datetime.time
- pressure_column: str = 'pressure'
- master_df: pandas.DataFrame | None = None
- housekeeping_df: pandas.DataFrame | None = None
- _reference_instrument: helikite.instruments.base.Instrument = None
- _reference_instrument_shift: str | None = None
- property df: pandas.DataFrame | None
Return the current state of dataframe.
- _data_state_info() list[str]
- set_pressure_column(column_name_override: str | None = None) None
Set the pressure column for each instrument’s dataframe
- set_time_as_index() None
Set the time column as the index for each instrument dataframe
- data_corrections(start_altitude: float = None, start_pressure: float = None, start_temperature: float = None) None
Apply instrument-specific correction routines.
- plot_pressure() None
Creates a plot with the pressure measurement of each instrument
Assumes the pressure column has been set for each instrument
- plot_time_sync(save_path: str | pathlib.Path, skip: list[helikite.instruments.base.Instrument])
Visualize pressure alignment after time synchronization.
- remove_duplicates() None
Remove duplicate rows from each instrument based on time index, and clear repeated values in ‘msems_scan_’, ‘msems_inverted_’ columns, and specific ‘mcda_*’ columns, keeping only the first instance.
- merge_instruments(tolerance_seconds: int = 0, remove_duplicates: bool = True) None
Merges all the dataframes from the instruments into one dataframe.
All columns from all instruments are included in the merged dataframe, with unique prefixes to avoid column name collisions.
- Parameters:
tolerance_seconds (int) – The tolerance in seconds for merging dataframes.
remove_duplicates (bool) – If True, removes duplicate times and keeps the first result.
- export_data(filepath: str | pathlib.Path | None = None) None
Export all data columns from all instruments to local files
The function will export a CSV and a Parquet file with all columns from all instruments. The files will be saved in the current working directory unless a filename is provided.
The Parquet file will include the metadata from the class.
- _apply_rolling_window_to_pressure(instrument, window_size: int = 20)
Apply rolling window to the pressure measurements of instrument
Then plot the pressure measurements with the rolling window applied
- define_flight_times()
Creates a plot to select the start and end of the flight
Uses the pressure measurements of the reference instrument to select the start and end of the flight. The user can click on the plot to select the points.
- pressure_based_time_synchronization(max_lag: float = np.inf)
Time-synchronizes instruments by maximizing cross-correlation between the pressure data of the reference instrument and the other instruments. Runs multiple iterations using a coarse-to-fine scheme, starting with a coarse adjustment and refining until the final lags are found.
- Parameters:
max_lag (int, optional) – Maximum allowed time lag in seconds used at the first (coarsest) level. If None, it defaults to half of the flight duration.
- _get_best_instrument_lags(df_pressure, lags: numpy.ndarray) dict[helikite.instruments.base.Instrument, numbers.Number]
- correct_time_and_pressure(max_lag=180, walk_time_seconds: int | None = None, apply_rolling_window_to: list[helikite.instruments.base.Instrument] = [], rolling_window_size: int = constants.ROLLING_WINDOW_DEFAULT_SIZE, reference_pressure_thresholds: tuple[float, float] | None = None, detrend_pressure_on: list[helikite.instruments.base.Instrument] = [], offsets: list[tuple[helikite.instruments.base.Instrument, int]] = [], match_adjustment_with: list[tuple[helikite.instruments.base.Instrument, helikite.instruments.base.Instrument]] = [])
Correct time and pressure for each instrument based on time lag.
- Parameters:
max_lag (int) – The maximum time lag to consider for cross-correlation.
walk_time_seconds (int) – The time in seconds to walk the pressure data to match the reference instrument.
apply_rolling_window_to (list[Instrument]) – A list of instruments to apply a rolling window to the pressure data.
rolling_window_size (int) – The size of the rolling window to apply to the pressure data.
reference_pressure_thresholds (tuple[float, float]) – A tuple with two values (low, high) to apply a threshold to the reference instrument’s pressure data.
detrend_pressure_on (list[Instrument]) – A list of instruments to detrend the pressure data.
offsets (list[tuple[Instrument, int]]) – A list of tuples with an instrument and an offset in seconds to apply to the time index.
match_adjustment_with (dict[Instrument, list[Instrument]]) – A list of tuples with two instruments, in order to be able to to match the same time adjustment. This can be used, for example, if an instrument does not have a pressure column, and as such, can use the time adjustment from another instrument. The first instrument is the one that has the index adjustment, and the second instrument is the one that will be adjusted.
- _build_df_pressure()
- shift_msems_columns_by_90s()
Shift all ‘msems_inverted_’ and ‘msems_scan_’ columns by 90 seconds in time.
- fill_missing_timestamps(instrument: helikite.instruments.base.Instrument, freq: str = '1S', fill_method: str | None = None)
Reindex the DataFrame of the instrument to fill in missing timestamps at the specified frequency. Optionally forward- or backward-fill missing values. Prints the number of timestamps added.
- Parameters:
instrument (Instrument) – The input DataFrame with a DateTimeIndex.
freq (str) – The desired frequency for the DateTimeIndex (e.g., “1S” for 1 second).
fill_method (str or None) – Method to fill missing values: “ffill”, “bfill”, or None (default: None).
- static detect_instruments(output_schema: helikite.classes.output_schemas.OutputSchema, input_folder: str | pathlib.Path) list[helikite.instruments.base.Instrument]
Automatically detect instruments from the files available in the input folder.
- static choose_reference_instrument(output_schema: helikite.classes.output_schemas.OutputSchema, instruments: list[helikite.instruments.base.Instrument]) helikite.instruments.base.Instrument | None
Select a reference instrument for synchronization.
- Parameters:
output_schema – Schema containing reference instrument candidates.
instruments – Detected instruments.
- Returns:
Selected reference instrument.
- Raises:
ValueError – If no suitable instrument is found.
- classmethod get_expected_columns(output_schema: helikite.classes.output_schemas.OutputSchema, with_dtype: bool) list[str] | dict[str, str]
Generate expected dataframe columns at level 0.
- Parameters:
output_schema – Schema containing campaign instruments.
with_dtype – Whether to include dtype mapping.
- Returns:
List of column names or dict of column-to-dtype.
- helikite.__version__ = '1.1.3'
- helikite.__appname__ = 'helikite-data-processing'
- helikite.__description__ = 'Library to generate quicklooks and data quality checks on Helikite campaigns'