2.1.3. rainfallqc.utils package
2.1.3.1. Submodules
2.1.3.2. rainfallqc.utils.data_readers module
Data loading tools.
Classes for reading rain gauge network data at bottom of file.
- class rainfallqc.utils.data_readers.GPCCNetworkReader(path_to_gpcc_dir: str, time_res: str, file_format: str = '.zip', unzipped_file_format: str = '.dat')[source]
Bases:
GaugeNetworkReaderGPCC rain gauge network reader.
Methods
get_nearest_overlapping_neighbours_to_target(...)Get IDs of the nearest neighbours to a target whilst checking that there is at least a minimum time overlap.
load_network_data(data_paths, target_gauge_col)Load GPCC network data based on file paths.
- load_network_data(data_paths: List[str] | ndarray[str], target_gauge_col: str, missing_val: int | float = -999.9) DataFrame[source]
Load GPCC network data based on file paths.
- Parameters:
- data_paths
Paths to load network data from.
- target_gauge_col
Rainfall data column
- missing_val
Missing value (default: -999)
- Returns:
- network_data
Dataframe of GPCC gauges.
- class rainfallqc.utils.data_readers.GSDRNetworkReader(path_to_gsdr_dir: str, file_format: str = '.txt')[source]
Bases:
GaugeNetworkReaderGSDR rain gauge network reader.
Methods
get_nearest_overlapping_neighbours_to_target(...)Get IDs of the nearest neighbours to a target whilst checking that there is at least a minimum time overlap.
load_network_data(rain_col_prefix, data_paths)Load GSDR network data based on file paths.
- load_network_data(rain_col_prefix: str, data_paths: List[str] | ndarray[str], suffix_only: bool = False, gsdr_header_rows: int = 20) DataFrame[source]
Load GSDR network data based on file paths.
- Parameters:
- data_paths
Paths to load network data from.
- rain_col_prefix
Prefix for rain column name (default is ‘rain’)
- suffix_only
Override to only include the suffix e.g. if the column name is the ID)
- gsdr_header_rows
Number of rows to skip in the header of the GSDR data (default=20)
- Returns:
- network_data
Dataframe of GSDR gauges.
- class rainfallqc.utils.data_readers.GaugeNetworkReader(path_to_gauge_network: str)[source]
Bases:
ABCBase class for reading rain gauge networks.
Methods
Get IDs of the nearest neighbours to a target whilst checking that there is at least a minimum time overlap.
- get_nearest_overlapping_neighbours_to_target(target_id: str, distance_threshold: int | float, n_closest: int, min_overlap_days: int) set[source]
Get IDs of the nearest neighbours to a target whilst checking that there is at least a minimum time overlap.
- Parameters:
- target_id
Target gauge to get neighbour IDs of
- distance_threshold
Distance threshold to check for neighbours
- n_closest
Number of nearest neighbours to return
- min_overlap_days
Minimum time overlap between neighbours to return
- Returns:
- neighbouring_gauge_id
IDs of neighbouring gauges within a given distance to target and min overlapping days
- rainfallqc.utils.data_readers.add_datetime_to_gsdr_data(gsdr_data: DataFrame, gsdr_metadata: dict, multiplying_factor: int | float) DataFrame[source]
Add datetime column to GSDR gauge data using metadata from that gauge.
NOTE: Could maybe extend so can find metadata if not provided?
- Parameters:
- gsdr_data
GSDR data
- gsdr_metadata
Metadata from GSDR file
- multiplying_factorint or float
Factor to multiply the data by.
- Returns:
- gsdr_data
GSDR data with datetime column added
- rainfallqc.utils.data_readers.convert_gsdr_metadata_dates_to_datetime(gsdr_metadata: dict) dict[source]
Convert GSDR metadata date string column to datetime.
- Parameters:
- gsdr_metadata
Metadata from GSDR file
- Returns:
- gsdr_metadatadict
- Metadata from GSDR file with start and end date column
- rainfallqc.utils.data_readers.get_paths_using_gauge_ids(gauge_ids: List[str] | ndarray[str], dir_path: str, file_format: str, time_res: str = None) dict[source]
Get data path of Gauge IDs.
- Parameters:
- gauge_ids
Array of gauge IDs
- dir_path
Path to data directory
- file_format
Format of files in directory.
- time_res
Time resolution (e.g. ‘mw’ or ‘tw’)
- Returns:
- gauge_paths
Dictionary of gauge ID and path
- rainfallqc.utils.data_readers.load_etccdi_data(etccdi_var: str, path_to_etccdi: str = None) Dataset[source]
Load ETCCDI data.
- Parameters:
- etccdi_var
variable to load from ETCCDI
- path_to_etccdi
path to ETCCDI data (default is location of data in tests)
- Returns:
- etccdi_data
Loaded data
- rainfallqc.utils.data_readers.load_gpcc_gauge_network_metadata(path_to_gpcc_dir: str, time_res: str, gpcc_file_format: str = '.dat') DataFrame[source]
Load metadata from GPCC gauges from a directory.
- Parameters:
- path_to_gpcc_dir
Path to directory with GPCC gauges
- time_res
Time resolution (e.g. ‘mw’ or ‘tw’)
- gpcc_file_format
Format of file (default is .dat)
- Returns:
- all_station_metadata
All GPCC gauges metadata as one dataframe.
- rainfallqc.utils.data_readers.load_gsdr_gauge_network_metadata(path_to_gsdr_dir: str, file_format: str = '.txt') DataFrame[source]
Load metadata from GSDR gauges from a directory.
- Parameters:
- path_to_gsdr_dir
Path to directory with GSDR gauges
- file_format
Format of file (default is .txt)
- Returns:
- all_station_metadata
All GSDR gauges metadata as one dataframe.
- rainfallqc.utils.data_readers.read_gpcc_data_from_zip(data_path: str, gpcc_file_name: str, target_gauge_col: str, time_res: str, hour_offset: int = 7, missing_val: int | float = -999) DataFrame[source]
Read the specific format and header of Global Precipitation Climatology Centre (GPCC) files.
- Parameters:
- data_path
path to GPCC zip file
- gpcc_file_name
Name of GPCC file within zip
- target_gauge_col
Name of rainfall column
- time_res
‘daily’ or ‘monthly’
- hour_offset
Hours to offset grouped data by (default is 7)
- missing_val
Missing value (default: -999)
- Returns:
- gpcc_datadict
Data from GPCC file
- rainfallqc.utils.data_readers.read_gpcc_metadata_from_zip(data_path: str, time_res: str, gpcc_file_format: str = '.dat') dict[source]
Read GPCC metadata from zip file.
- Parameters:
- data_path
path to GPCC zip file.
- time_res
Time resolution of data (e.g. daily or monthly)
- gpcc_file_format
Default GPCC file format (default: .dat)
- Returns:
- metadata
Metadata from GPCC file
- rainfallqc.utils.data_readers.read_gsdr_data_from_file(data_path: str, raw_data_time_res: str, rain_col_prefix: str = None, rain_col_suffix: str = None, suffix_only: bool = False, gsdr_header_rows: int = 20) DataFrame[source]
Read GSDR data from file.
Note: this was developed on the GSDR data available from IntenseQC. So it needs a number of header rows in data.
- Parameters:
- data_path
Path to GSDR data file
- raw_data_time_res
Time resolution of data record i.e. ‘hourly’ or ‘daily’
- rain_col_prefix
Prefix for column for target_gauge_col (set as None by default)
- rain_col_suffix
Suffix for column name for target_gauge_col (set as None by default)
- suffix_only
Override to only include the suffix e.g. if the column name is the ID)
- gsdr_header_rows
Number of rows to skip in the header of the GSDR data (default=20)
- Returns:
- gsdr_data
GSDR data as Pandas DataFrame
2.1.3.3. rainfallqc.utils.data_utils module
All data operations for polars including datetime and calendar functionality.
Classes and functions ordered alphabetically.
- rainfallqc.utils.data_utils.back_propagate_daily_data_flags(data: DataFrame, flag_column: str, num_days: int) DataFrame[source]
Back fill-in flags a number of days.
This will prioritise higher flag values.
- Parameters:
- data
Daily data with flag_column
- flag_column
column with flags
- num_days:
Number of days to back-propagate
- Returns:
- data
Data with flags back-propogated
- rainfallqc.utils.data_utils.calculate_dry_spell_fraction(data: DataFrame, target_gauge_col: str, dry_period_days: int) Series[source]
Calculate dry spell fraction.
- Parameters:
- data
Data with time column
- target_gauge_col
Column with rainfall data
- dry_period_days
Length for of a “dry_spell”
- Returns:
- rain_daily_dry_day
Data with dry spell fraction
- rainfallqc.utils.data_utils.check_data_has_consistent_time_step(data: DataFrame) None[source]
Check data has a consistent time step i.e. ‘1h’.
- Parameters:
- data
Data with time column
- Raises:
- ValueError
If data has more than one time steps
- rainfallqc.utils.data_utils.check_data_is_monthly(data: DataFrame) None[source]
Check data is monthly.
- Parameters:
- data
Data with time column
- Raises:
- ValueError
If data has a no monthly time steps
- rainfallqc.utils.data_utils.check_data_is_specific_time_res(data: DataFrame, time_res: str | list) None[source]
Check data has a hourly or daily time step.
Does not work for monthly data, please use ‘check_data_is_monthly’.
- Parameters:
- data
Data with time column.
- time_res
Time resolutions either a single string or list of strings
- Raises:
- ValueError
If data is not hourly or daily.
- rainfallqc.utils.data_utils.check_for_negative_values(df: DataFrame, target_gauge_col: str) bool[source]
Check if the target column contains any negative values.
- Parameters:
- df
DataFrame to check.
- target_gauge_col
Column to check for negative values.
- Raises:
- ValueError
If negative values are found in the target column.
- rainfallqc.utils.data_utils.convert_daily_data_to_monthly(daily_data: DataFrame, rain_cols: list, perc_for_valid_month: int | float = 95) DataFrame[source]
Convert daily data to monthly whilst setting month to NaN if less than a given percentage of days is missing.
- Parameters:
- daily_data
Daily data to convert to monthly
- rain_cols
Columns with rainfall data
- perc_for_valid_month
Percentage of month needed to be classed as a valid month for the monthly group by
- Returns:
- monthly_data
Monthly data
- rainfallqc.utils.data_utils.convert_datarray_seconds_to_days(series_seconds: DataArray) ndarray[source]
Convert xarray series from seconds to days. For some reason the CDD data from ETCCDI is in seconds.
- Parameters:
- series_seconds
Data in series to convert to days.
- Returns:
- series_days
Data array converted to days.
- rainfallqc.utils.data_utils.downsample_and_fill_columns(high_res_data: DataFrame, low_res_data: DataFrame, data_cols: str | list[str], fill_limit: int, fill_method: str = 'backward', time_col: str = 'time') DataFrame[source]
Join columns from lower resolution data to higher resolution data and fill gaps.
- Parameters:
- high_res_data
Higher resolution data (e.g., 15-min)
- low_res_data
Lower resolution data with columns to join (e.g., hourly)
- data_cols
Column name(s) to join and fill. Can be: - Single column name: “rainfall” - List of columns: [“rain1”, “rain2”] - Regex pattern: “^rain.*$”
- fill_limit
Maximum number of intervals to fill
- fill_method
“forward”, “backward”, or “none”
- time_col
Name of time column (default: ‘time’)
- Returns:
- high_res_data_filled
High resolution data with filled columns
- rainfallqc.utils.data_utils.downsample_monthly_data(sub_monthly_data: DataFrame, monthly_data: DataFrame, data_cols: str | list[str], time_col: str = 'time') DataFrame[source]
Join monthly data to hourly and fill only within same month.
- Parameters:
- sub_monthly_data
Sub-monthly data (e.g., hourly)
- monthly_data
Monthly data with columns to join
- data_cols
Column name(s) to join and fill. Can be: - Single column name: “rainfall” - List of columns: [“rain1”, “rain2”]
- time_col
Name of time column (default: ‘time’)
- Returns:
- result
Sub-monthly data with monthly columns joined and filled within month
- rainfallqc.utils.data_utils.extract_negative_values_from_data(data: DataFrame, cols_to_extract_from: list) DataFrame[source]
Extract negative values from data.
- Parameters:
- data
Rainfall data.
- cols_to_extract_from
Columns to extract negative values from
- Returns:
- data
Data with only negative values or 0.
- rainfallqc.utils.data_utils.extract_positive_values_from_data(data: DataFrame, cols_to_extract_from: list) DataFrame[source]
Extract positive values from data.
- Parameters:
- data
Rainfall data.
- cols_to_extract_from
Columns to extract positive values from
- Returns:
- data
Data with only positive values or 0.
- rainfallqc.utils.data_utils.format_timedelta_duration(td: timedelta) str[source]
Convert timedelta to custom strings.
- Parameters:
- td
Time delta to convert.
- Returns:
- td
Human-readable timedelta string using largest unit (d, h, m, s).
- rainfallqc.utils.data_utils.get_data_timestep_as_str(data: DataFrame) str[source]
Get time step of data.
- Parameters:
- data
Data with time column
- Returns:
- time_step
Time step of data i.e. ‘1h’, ‘1d’, ‘15m’.
- rainfallqc.utils.data_utils.get_data_timesteps(data: DataFrame) Series[source]
Get data timesteps. Ideally the data should have 1.
- Parameters:
- data
Data with time column.
- Returns:
- unique_timesteps
All unique time steps in data (timedelta).
- rainfallqc.utils.data_utils.get_dry_period_proportions(dry_period_days: int) dict[source]
Get dry period proportions.
- Parameters:
- dry_period_days
Length for of a “dry_spell” (default: 15 days)
- Returns:
- fraction_dry_days
Dictionary with keys “1”, “2”, “3” with dry spell fractions
- rainfallqc.utils.data_utils.get_dry_spells(data: DataFrame, target_gauge_col: str) DataFrame[source]
Get dry spell column.
- Parameters:
- data
Rainfall data
- target_gauge_col
Column with rainfall data
- Returns:
- data_w_dry_spells
Data with is_dry binary column
- rainfallqc.utils.data_utils.get_expected_days_in_month(data: DataFrame) DataFrame[source]
Get expected number of days in a months within the data.
- Parameters:
- data
Data with ‘year’ and ‘month’ columns
- Returns:
- data:
Data with ‘expected_days_in_month” column
- rainfallqc.utils.data_utils.get_normalised_diff(data: DataFrame, target_col: str, other_col: str, diff_col_name: str) DataFrame[source]
Ger normalised difference between two columns in data.
- Parameters:
- data
Data with columns
- target_col
Target column
- other_col
Other column.
- diff_col_name
New column name for difference column
- Returns:
- data_w_norm_diff
Data with normalised diff
- rainfallqc.utils.data_utils.make_month_and_year_col(data: DataFrame) DataFrame[source]
Make year and month columns for polars dataframe.
- Parameters:
- data
Data with time column
- Returns:
- data
Data with year and month columns
- rainfallqc.utils.data_utils.normalise_data(data: Series | Expr) Series[source]
Normalise data to [0, 1].
- Parameters:
- data
Data with time column.
- Returns:
- norm_data
Normalised data.
- rainfallqc.utils.data_utils.offset_data_by_time(data: DataFrame, target_col: str, offset_in_time: int, time_res: str) DataFrame[source]
Shift/offset data either backwards or forwards in time.
- Parameters:
- data
Data with column to offset in ‘time’
- target_col
Column of data to offset
- offset_in_time
Amount to offset data by i.e. 1 for 1 day if time_res set to ‘1d’
- time_res
Time resolution like ‘hourly’, ‘daily’, ‘1h’ or ‘1d’
- Returns:
- data
Offset data by ‘offset_in_time’ amount
- rainfallqc.utils.data_utils.replace_missing_vals_with_nan(data: DataFrame, target_gauge_col: str, missing_val: int = None) DataFrame[source]
Replace no data value with numpy.nan.
- Parameters:
- data
Rainfall data
- target_gauge_col
Column of rainfall
- missing_val
Missing value identifier
- Returns:
- gsdr_data
GSDR data with missing values replaced
- rainfallqc.utils.data_utils.resample_data_by_time_step(data: DataFrame, rain_cols: List[str], time_col: str, time_step: str, min_count: int, hour_offset: int) DataFrame[source]
Group hourly data into daily and check for at least 24 daily time steps per day.
- Parameters:
- data
Rainfall data to resample
- rain_cols
List of column with rainfall data
- time_col
Name of time column
- time_step
Time step to resample into (e.g. ‘1d’ for daily, ‘1h’ for hourly, ‘15m’ for 15 minute)
- min_count
Minimum number of time steps needed per time period
- hour_offset
Time offset in hours (needed if data is not aligned to midnight)
- Returns:
- resampled_data
Rainfall data grouped into a given time step
2.1.3.4. rainfallqc.utils.neighbourhood_utils module
All neighbourhood and nearby related operations.
- rainfallqc.utils.neighbourhood_utils.compute_km_distances_from_target_id(gauge_network_metadata: DataFrame, target_id: str, station_id_col: str) DataFrame[source]
Compute kilometre distances between gauges in network and target gauges.
- Parameters:
- gauge_network_metadata
Metadata for gauge network. Each gauge must have ‘longitude’ and ‘latitude’.
- target_id
Target gauge to compare against.
- station_id_col
Column name for station ID in gauge_network_metadata
- Returns:
- neighbour_distances_df
Data of distances to a target gauge in kilometers
- rainfallqc.utils.neighbourhood_utils.compute_temporal_overlap_days(start_1: datetime, end_1: datetime, start_2: datetime, end_2: datetime) int[source]
Compute temporal overlap in days.
Note: assumes that the data is contiguous.
- Parameters:
- start_1
Start time of timestamp 1
- end_1
End time of timestamp 2
- start_2
Start time of timestamp 2
- end_2
End time of timestamp 2
- Returns:
- overlap_days
Days that overlap between the two timestamps
- rainfallqc.utils.neighbourhood_utils.compute_temporal_overlap_days_from_target_id(gauge_network_metadata: DataFrame, target_id: str, station_id_col: str, start_datetime_col: str, end_datetime_col: str) DataFrame[source]
Compute overlap in days between target gauges and its neighbours.
Note: assumes that the data is contiguous.
- Parameters:
- gauge_network_metadata
Metadata for gauge network. Each gauge must have ‘longitude’ and ‘latitude’.
- target_id
Target gauge to compare against.
- station_id_col
Column name for station ID in gauge_network_metadata
- start_datetime_col
Column name for start datetime in gauge_network_metadata
- end_datetime_col
Column name for end datetime in gauge_network_metadata
- Returns:
- neighbour_overlap_days_df
Neighbouring gauges with overlap days to target gauge.
- rainfallqc.utils.neighbourhood_utils.get_ids_of_n_nearest_overlapping_neighbouring_gauges(gauge_network_metadata: DataFrame, target_id: str, distance_threshold: int | float, n_closest: int, min_overlap_days: int, station_id_col: str = 'station_id', start_datetime_col: str = 'start_datetime', end_datetime_col: str = 'end_datetime') list[source]
Get gauge IDs of nearest n time-overlapping neighbouring gauges.
- Parameters:
- gauge_network_metadata
Metadata for gauge network. Each gauge must have ‘longitude’ and ‘latitude’.
- target_id
Target gauge to compare against.
- distance_threshold
Threshold for maximum distance considered
- n_closest
Number of closest neighbours.
- min_overlap_days
Minimum overlap between target and neighbouring gauges
- station_id_col
Column name for station ID in gauge_network_metadata (default ‘station_id’)
- start_datetime_col
Column name for start datetime in gauge_network_metadata (default ‘start_datetime’)
- end_datetime_col
Column name for end datetime in gauge_network_metadata (default ‘end_datetime’)
- Returns:
- neighbouring_gauge_id
IDs of neighbouring gauges within a given distance to target and min overlapping days
- rainfallqc.utils.neighbourhood_utils.get_n_closest_neighbours(neighbour_distances_df: DataFrame, distance_threshold: int | float, n_closest: int) DataFrame[source]
Get closest neighbours from neighbour distances data.
Will return more than number of n_closest if there is multiple values that are equal at that index. Will not return values that are 0 dist away.
- Parameters:
- neighbour_distances_df
Data of distances to a target gauge
- distance_threshold
Threshold for maximum distance considered
- n_closest
Number of closest neighbours.
- Returns:
- n_closest_neighbour_df
Data of n_closest neighbours
- rainfallqc.utils.neighbourhood_utils.get_nearest_non_nan_etccdi_val_to_gauge(etccdi_data: Dataset, etccdi_name: str, gauge_lat: int | float, gauge_lon: int | float, max_distance_km: int | float = 500) Dataset[source]
Get the value at the nearest non-nan ETCCDI grid cell to the gauge coordinates.
- Parameters:
- etccdi_data
ETCCDI data with given variable to check
- etccdi_name
ETCCDI variable name to check
- gauge_lat
latitude of the rain gauge
- gauge_lon
longitude of the rain gauge
- max_distance_km
Maximum distance in km to search for a non-nan value (default 500 km)
- Returns:
- nearby_etccdi_data
ETCCDI data at the nearest grid cell with non-nan values
- rainfallqc.utils.neighbourhood_utils.get_neighbours_with_min_overlap_days(neighbour_overlap_days_df: DataFrame, min_overlap_days: int) DataFrame[source]
Get neighbours around gauge at least min_overlap_days of overlapping time steps.
Note: assumes that the data is contiguous.
- Parameters:
- neighbour_overlap_days_df
Neighbouring gauges with overlap days to target gauge.
- min_overlap_days
Minimum overlap between target and neighbouring gauges
- Returns:
- neighbour_overlap_days_df
Neighbouring gauges with at least min_overlap_days overlap days.
- rainfallqc.utils.neighbourhood_utils.get_rain_not_minima_column(data: DataFrame, target_col: str, other_col: str) DataFrame[source]
Get rain not equal to minima column.
Combines two functions for getting non_zero_minima i.e. 0.1 and then get ‘rain_not_minima’
- Parameters:
- data
Rainfall data
- target_col
Target rainfall column
- other_col
Other rainfall column
- Returns:
- data_w_minima_col
Rainfall data with rain is minima column
- rainfallqc.utils.neighbourhood_utils.get_target_neighbour_non_zero_minima(data: DataFrame, target_col: str, other_col: str, default_minima: float = 0.1) float[source]
Get minimum non-zero value in rainfall data between target and neighbour.
- Parameters:
- data
Rainfall data
- target_col
Target rainfall column
- other_col
Other rainfall column
- default_minima
Default minimum to use for non-zero value
- Returns:
- non_zero_minima
Minimum non-zero value.
- rainfallqc.utils.neighbourhood_utils.make_rain_not_minima_column_target_or_neighbour(data: DataFrame, target_col: str, other_col: str, data_minima: float) DataFrame[source]
Get rain values that are not minima rainfall for target or neighbour.
- Parameters:
- data
Rainfall data
- target_col
Target rainfall column
- other_col
Other rainfall column
- data_minima
Data minimum (i.e. lowest non-zero value)
- Returns:
- data
Rainfall data with “rain_not_minima” column
2.1.3.5. rainfallqc.utils.spatial_utils module
All spatial operations.
Classes and functions ordered alphabetically.
- rainfallqc.utils.spatial_utils.compute_spatial_mean_xr(data: Dataset, var_name: str) Dataset[source]
Get the value at the nearest ETCCDI grid cell to the gauge coordinates.
- Parameters:
- data
Data with variable to compute mean from. Should have lat/lon and time (as axis 0)
- var_name
Variable to make mean value of
- Returns:
- data
Data with spatial mean
- rainfallqc.utils.spatial_utils.haversine(lon1: DataArray, lat1: DataArray, lon2: ndarray | float, lat2: ndarray | float) float[source]
Great circle distance (km) between two points on Earth.
- Parameters:
- lon1xr.DataArray
Longitude of point 1
- lat1xr.DataArray
Latitude of point 1
- lon2np.ndarray | float
Longitude of point 2
- lat2np.ndarray | float
Latitude of point 2
- Returns:
- distancefloat
Distance between the two points in km
2.1.3.6. rainfallqc.utils.stats module
Statistical tests and other indices for rainfall data quality control.
Classes and functions ordered alphabetically.
- rainfallqc.utils.stats.affinity_index(data: DataFrame, binary_col: str, return_match_and_diff: bool = False) tuple | float[source]
Calculate affinity index from binary column.
- Parameters:
- data
Rainfall data
- binary_col
Column with binary data
- return_match_and_diff
Whether to return count of matching and difference columns as well as affinity index.
- Returns:
- affinity
Affinity index.
- rainfallqc.utils.stats.dry_spell_fraction(rain_daily: DataFrame, target_gauge_col: str, dry_period_days: int) Series[source]
Make dry spell fraction column.
- Parameters:
- rain_daily
Single time-step of rainfall data with ‘dry_day’ column
- target_gauge_col
Column with Rainfall data
- dry_period_days
Dry periods window in days
- Returns:
- rain_daily_w_dry_spell_fraction
Single row with dry spell fraction column
- rainfallqc.utils.stats.factor_diff(data: DataFrame, target_col: str, other_col: str) DataFrame[source]
Compute factor diff for polars.
- Parameters:
- data
Rainfall data
- target_col
Target column to compute factor diff for
- other_col
Other column to compute factor diff for
- Returns:
- data_w_factor_diff
Data with factor diff
- rainfallqc.utils.stats.filter_out_rain_world_records(data: DataFrame, target_gauge_col: str, time_res: str) DataFrame[source]
Filter out rain world records based on time resolution.
- Parameters:
- data
Rainfall data
- target_gauge_col
Column with rainfall data
- time_res
Temporal resolution of the time series either ‘daily’ or ‘hourly’
- Returns:
- data_not_wr
Data without rain world records
- rainfallqc.utils.stats.fit_expon_and_get_percentile(series: Series, percentiles: list[float]) dict[float, float][source]
Fit exponential to data series and then get percentile using PPF.
- Parameters:
- series
Data series to fit exponential distribution.
- percentiles
Percentiles (between 0-1) to evaluate on the fitted exponential distribution
- Returns:
- expon_percentiles
Threshold at percentile of fitted distribution
- rainfallqc.utils.stats.gauge_correlation(data: DataFrame, target_col: str, other_col: str) float[source]
Calculate correlation between rain gauge data columns.
- Parameters:
- data
Rainfall data
- target_col
Target rainfall column
- other_col
Other rainfall column
- Returns:
- corr_coef
Correlation coefficient.
- rainfallqc.utils.stats.get_rainfall_world_records() dict[str, float][source]
Return rainfall world record as of 29/04/25.
See: - http://www.nws.noaa.gov/oh/hdsc/record_precip/record_precip_world.html - http://www.bom.gov.au/water/designRainfalls/rainfallEvents/worldRecRainfall.shtml - https://wmo.asu.edu/content/world-meteorological-organization-global-weather-climate-extremes-archive
- Returns:
- rwr
rainfall world records set in stats.py
- rainfallqc.utils.stats.percentage_diff(target: Expr, other: Expr) Series[source]
Percentage difference between target and other column.
- Parameters:
- target:
Target data to compare other too
- other:
Other data
- Returns:
- perc_diff:
Percentage difference
- rainfallqc.utils.stats.pettitt_test(arr: Series | ndarray)[source]
Pettitt test for detecting a change point in a time series.
Calculated following Pettitt (1979): https://www.jstor.org/stable/2346729?seq=4#metadata_info_tab_contents.
TAKEN FROM: https://stackoverflow.com/questions/58537876/how-to-run-standard-normal-homogeneity-test-for-a-time-series-data.
- Parameters:
- arrpl.Series or np.ndarray
The input time series data.
- Returns:
- tauint
Index of the change point (first point of the second segment).
- pfloat
p-value for the test statistic.
- rainfallqc.utils.stats.simple_precip_intensity_index(data: DataFrame, target_gauge_col: str, wet_threshold: int | float) float[source]
Calculate simple precipitation intensity index.
- Parameters:
- data
Rainfall data
- target_gauge_col
Column with rainfall data
- wet_threshold
Threshold for rainfall intensity in given time period
- Returns:
- sdii_val
Simple precipitation intensity index
2.1.3.7. Module contents
Utility functions.