Data driven Astronomy

News · 10 May 2024

Msouryadeepta: Creating a new draft of DDA with its technical uses and histories mentioned.

'''Data Driven Astronomy(DDA)''' refers to the use of [[Data science|Data Science]] in [[Astronomy]]. Several outputs of [[Telescopic observational astronomy|telescopic observations]] and [[Astronomical survey|sky surveys]] are taken into consideration and approaches related to [[data mining]] and big data management are used to analyze, filter, and [[Normalization (statistics)|normalize]] the [[Data set|datasets]] that are further used for making Classifications, Predictions, and Anomaly detections by [[Advances in Statistics|advanced Statistical approaches]], [[Digital image processing|Digital Image Processing]] and [[Machine learning|Machine Learning]]. The output of these processes is used by [[Astronomer|Astronomers]] and Space Scientists to study and identify patterns, anomalies, and movements in outer space and conclude theories and discoveries in the [[cosmos]].

== History ==
In 2007, the [[Galaxy Zoo|Galaxy Zoo project]]<ref>{{Cite web |title=Zooniverse |url=https://www.zooniverse.org/projects/zookeeper/galaxy-zoo |access-date=2024-05-10 |website=www.zooniverse.org}}</ref> was launched for [[Galaxy morphological classification|morphological classification]]<ref>{{Cite journal |last=Cavanagh |first=Mitchell K. |last2=Bekki |first2=Kenji |last3=Groves |first3=Brent A. |date=2021-07-08 |title=Morphological classification of galaxies with deep learning: comparing 3-way and 4-way CNNs |url=http://arxiv.org/abs/2106.01571 |journal=Monthly Notices of the Royal Astronomical Society |volume=506 |issue=1 |pages=659–676 |doi=10.1093/mnras/stab1552 |issn=0035-8711}}</ref><ref>{{Cite journal |last=Goyal |first=Lalit Mohan |last2=Arora |first2=Maanak |last3=Pandey |first3=Tushar |last4=Mittal |first4=Mamta |date=2020-12-01 |title=Morphological classification of galaxies using Conv-nets |url=https://doi.org/10.1007/s12145-020-00526-w |journal=Earth Science Informatics |language=en |volume=13 |issue=4 |pages=1427–1436 |doi=10.1007/s12145-020-00526-w |issn=1865-0481}}</ref> of a large number of [[Galaxy|galaxies]]. In this project, 900,000 images were considered for classification that were taken from the [[Sloan Digital Sky Survey|Sloan Digital Sky Survey (SDSS)]]<ref name=":0">{{Cite web |title=Sloan Digital Sky Survey-V: Pioneering Panoptic Spectroscopy - SDSS-V |url=https://www.sdss.org/ |access-date=2024-05-10 |language=en-US}}</ref> for the past 7 years. The task was to study each picture of a galaxy, classify it as [[Elliptical galaxy|elliptical]] or [[Spiral galaxy|spiral]], and determine whether it was spinning or not. The team of Astrophysicists led by [[Kevin Schawinski]] in [[University of Oxford|Oxford University]] were in charge of this project and Kevin and his colleague [[Chris Lintott|Chris Linlott]] figured out that it would take a period of 3-5 years for such a team to complete the work<ref>{{Cite web |last=Pati |first=Satavisa |date=2021-06-18 |title=How Data Science is Used in Astronomy? |url=https://www.analyticsinsight.net/data-science/how-data-science-is-used-in-astronomy |access-date=2024-05-10 |website=Analytics Insight |language=en}}</ref>. There they came up with the idea of using Machine Learning and Data Science techniques for analyzing the images and classifying them<ref>{{Citation |last=Baron |first=Dalya |title=Machine Learning in Astronomy: a practical overview |date=2019-04-15 |url=http://arxiv.org/abs/1904.07248 |access-date=2024-05-10 |doi=10.48550/arXiv.1904.07248}}</ref>.

== Methodology ==
The data retrieved from the sky surveys are first brought for [[Data preprocessing|Pre-processing]]. In this, [[Data redundancy|redundancies]] are removed and filtrated. Further, [[Feature engineering|feature extraction]] is performed on this filtered data set, which is further taken for processes<ref name=":1">{{Cite journal |last=Zhang |first=Yanxia |last2=Zhao |first2=Yongheng |date=2015-05-22 |title=Astronomy in the Big Data Era |url=http://datascience.codata.org/article/10.5334/dsj-2015-011/ |journal=Data Science Journal |volume=14 |issue=0 |pages=11 |doi=10.5334/dsj-2015-011 |issn=1683-1470}}</ref>. Some of the renowned sky surveys are listed below:

* The Palomar Digital Sky Survey (DPOSS)<ref>{{Cite web |title=The Palomar Digital Sky Survey (DPOSS) |url=https://sites.astro.caltech.edu/~george/dposs/dposs_pop.html |access-date=2024-05-10 |website=sites.astro.caltech.edu}}</ref>
* The Two-Micron All Sky Survey (2MASS)<ref>{{Cite web |title=IRSA - Two Micron All Sky Survey (2MASS) |url=https://irsa.ipac.caltech.edu/Missions/2mass.html |access-date=2024-05-10 |website=irsa.ipac.caltech.edu}}</ref>
* [[Green Bank Telescope|Green Bank Telescope (GBT)]]<ref>{{Cite web |date=2023-06-26 |title=GBT |url=https://greenbankobservatory.org/portal/gbt/ |access-date=2024-05-10 |website=Green Bank Observatory |language=en-US}}</ref>
* The Galaxy Evolution Explorer (GALEX)<ref>{{Cite web |title=GALEX - Galaxy Evolution Explorer |url=http://www.galex.caltech.edu/ |access-date=2024-05-10 |website=www.galex.caltech.edu}}</ref>
* The Sloan Digital Sky Survey (SDSS)<ref name=":0" />
* [[SkyMapper|SkyMapper Southern Sky Survey (SMSS)]]<ref>{{Cite web |title=SkyMapper Southern Sky Survey |url=https://skymapper.anu.edu.au/ |access-date=2024-05-10 |website=skymapper.anu.edu.au}}</ref>
* [[Pan-STARRS|The Panoramic Survey Telescope and Rapid Response System (PanSTARRS)]]<ref>{{Cite web |title=Pan-STARRS1 data archive home page - PS1 Public Archive - STScI Outerspace |url=https://outerspace.stsci.edu/display/PANSTARRS/ |access-date=2024-05-10 |website=outerspace.stsci.edu}}</ref>
* [[Vera C. Rubin Observatory|The Large Synoptic Survey Telescope (LSST)]]<ref>{{Cite web |last=Telescope |first=Large Synoptic Survey |title=Rubin Observatory |url=https://www.lsst.org/ |access-date=2024-05-10 |website=Rubin Observatory |language=en}}</ref>
* [[Square Kilometre Array|The Square Kilometer Array (SKA)]]<ref>{{Cite web |title=Explore {{!}} SKAO |url=https://www.skao.int/en |access-date=2024-05-10 |website=www.skao.int}}</ref>

The size of data from the above-mentioned sky surveys ranges from 3 [[Terabyte|TB]] to almost 4.6 [[Exabyte|EB]]<ref name=":1" />. Further, [[data mining]] tasks that are involved in the management and manipulation of the data involve methods like [[Statistical classification|Classification]], [[Regression analysis|Regression]], [[Cluster analysis|Clustering]], [[Anomaly detection|Anomaly Detection]], and [[Time series|Time-Series Analysis]]. Several approaches and applications for each of these methods are involved in the task accomplishments.

=== Classification ===
'''''Classification'''''<ref>{{Cite journal |last=Chowdhury |first=Shovan |last2=Schoen |first2=Marco P. |date=2020-10-02 |title=Research Paper Classification using Supervised Machine Learning Techniques |url=https://ieeexplore.ieee.org/document/9249211/ |publisher=IEEE |pages=1–6 |doi=10.1109/IETC47856.2020.9249211 |isbn=978-1-7281-4291-3}}</ref> is used for specific identifications and categorizations of astronomical data such as [[Stellar classification|Spectral classification]], Photometric classification, Morphological classification, and classification of [[Solar phenomena|solar activity]]. The approaches of classification techniques are listed below:

* [[Artificial Neural Network|Artificial Neural Networks (ANN)]]
* [[Support vector machine|Support Vector Machines (SVM)]]
* [[Learning vector quantization|Learning Vector Quantization (LVQ)]]
* [[Decision tree|Decision Trees]]
* [[Random forest|Random Forest]]
* [[K-nearest neighbors algorithm|K-Nearest Neighbors]]
* [[Naive Bayes classifier|Naïve Bayesian Networks]]
* [[Radial basis function|Radial Basis Function Network]]
* [[Gaussian process|Gaussian Process]]
* [[Decision table|Decision Table]]
* [[Alternating decision tree|Alternating Decision Tree (ADTree)]]

=== Regression ===
'''''Regression'''''<ref>{{Citation |last=Sarstedt |first=Marko |title=Regression Analysis |date=2014 |work=A Concise Guide to Market Research: The Process, Data, and Methods Using IBM SPSS Statistics |pages=193–233 |editor-last=Sarstedt |editor-first=Marko |url=https://doi.org/10.1007/978-3-642-53965-7_7 |access-date=2024-05-10 |place=Berlin, Heidelberg |publisher=Springer |language=en |doi=10.1007/978-3-642-53965-7_7 |isbn=978-3-642-53965-7 |last2=Mooi |first2=Erik |editor2-last=Mooi |editor2-first=Erik}}</ref> is used to make predictions based on the retrieved data through statistical trends and statistical modeling. Different uses of this technique are used for fetching [[Photometric redshift|Photometric redshifts]] and measurements of physical parameters of stars<ref>{{Cite journal |title=Bulletin de la Société Royale des Sciences de Liège {{!}} PoPuPS |url=https://popups.uliege.be/0037-9565/index.php |journal=Bulletin de la Société Royale des Sciences de Liège |language=fr |issn=0037-9565}}</ref>. The approaches are listed below:

* [[Artificial Neural Network|Artificial Neural Networks (ANN)]]
* [[Support vector regression|Support Vector Regression (SVR)]]
* [[Decision tree|Decision Trees]]
* [[Random forest|Random Forest]]
* [[K-nearest neighbors algorithm|K-Nearest Neighbors Regression]]
* [[Kernel regression|Kernel Regression]]
* [[Principal component regression|Principal Component Regression (PCR)]]
* [[Gaussian process|Gaussian Process]]
* [[Linear least squares|Least Squared Regression (LSR)]]
* [[Partial least squares regression|Partial Least Squares Regression]]

=== Clustering ===
'''''Clustering'''''<ref>{{Cite journal |last=Bindra |first=Kamalpreet |last2=Mishra |first2=Anuranjan |date=2017-09 |title=A detailed study of clustering algorithms |url=http://ieeexplore.ieee.org/document/8342454/ |publisher=IEEE |pages=371–376 |doi=10.1109/ICRITO.2017.8342454 |isbn=978-1-5090-3012-5}}</ref> is classifying objects based on a [[similarity measure]] metric. It is used in Astronomy for Classification as well as [[Object detection|Special/rare object detection]]. The approaches are listed below:

* [[Principal component analysis|Principal Component Analysis (PCA)]]
* [[DBSCAN|DBScan]]
* [[K-means clustering|K-Means Clustering]]
* [[OPTICS algorithm|OPTICS]]
* [[Cobweb model]]
* [[Self-organizing map|Self Organizing Map (SOM)]]
* [[Expectation–maximization algorithm|Expectation Maximization]]
* [[Hierarchical clustering|Hierarchical Clustering]]
* AutoClass<ref>{{Cite journal |last=Pizzuti |first=C. |last2=Talia |first2=D. |date=2003-05 |title=P-autoclass: scalable parallel clustering for mining large data sets |url=http://ieeexplore.ieee.org/document/1198395/ |journal=IEEE Transactions on Knowledge and Data Engineering |language=en |volume=15 |issue=3 |pages=629–641 |doi=10.1109/TKDE.2003.1198395 |issn=1041-4347}}</ref>
* [[Gaussian process|Gaussian Mixture Modeling (GMM)]]

=== Anomaly Detection ===
'''''Anomaly Detection'''''<ref>{{Cite journal |last=Thudumu |first=Srikanth |last2=Branch |first2=Philip |last3=Jin |first3=Jiong |last4=Singh |first4=Jugdutt (Jack) |date=2020-07-02 |title=A comprehensive survey of anomaly detection techniques for high dimensional big data |url=https://doi.org/10.1186/s40537-020-00320-x |journal=Journal of Big Data |volume=7 |issue=1 |pages=42 |doi=10.1186/s40537-020-00320-x |issn=2196-1115}}</ref> is used for detecting irregularities in the dataset. However, this technique is used here to detect [[Object detection|rare/special objects]]. The following approaches are used:

* [[Principal component analysis|Principal Component Analysis (PCA)]]
* [[K-means clustering|K-Means Clustering]]
* [[Expectation–maximization algorithm|Expectation Maximization]]
* [[Hierarchical clustering|Hierarchical Clustering]]
* [[Support vector machine|One-class SVM]]

=== Time-Series Analysis ===
'''''Time-Series Analysis'''''<ref>{{Cite book |url=https://onlinelibrary.wiley.com/doi/book/10.1002/0471264385 |title=Handbook of Psychology |date=2003-04-15 |publisher=Wiley |isbn=978-0-471-17669-5 |editor-last=Weiner |editor-first=Irving B. |edition=1 |language=en |doi=10.1002/0471264385.wei0223}}</ref> helps in analyzing trends and predicting outputs over time. It is used for Trend Prediction and Novel detection (detection of unknown data). The approaches used here are:

* [[Artificial Neural Network|Artificial Neural Networks (ANN)]]
* [[Support vector regression|Support Vector Regression (SVR)]]
* [[Decision tree|Decision Trees]]

Okumaya devam et...

Data driven Astronomy

[XFB] Konu Bilgileri

News

Moderator

Yasal Uyarı

Neler yeni

Forum istatistikleri

Bu sayfayı paylaş

Gizliliğinize değer veriyoruz