Data driven Astronomy

[XFB] Konu Bilgileri

Konu Hakkında Merhaba, tarihinde Wiki kategorisinde News tarafından oluşturulan Data driven Astronomy başlıklı konuyu okuyorsunuz. Bu konu şimdiye dek 2 kez görüntülenmiş, 0 yorum ve 0 tepki puanı almıştır...
Kategori Adı Wiki
Konu Başlığı Data driven Astronomy
Konbuyu başlatan News
Başlangıç tarihi
Cevaplar
Görüntüleme
İlk mesaj tepki puanı
Son Mesaj Yazan News

News

Moderator
Top Poster Of Month
Credits
0
Msouryadeepta: Creating a new draft of DDA with its technical uses and histories mentioned.


'''Data Driven Astronomy(DDA)''' refers to the use of [[Data science|Data Science]] in [[Astronomy]]. Several outputs of [[Telescopic observational astronomy|telescopic observations]] and [[Astronomical survey|sky surveys]] are taken into consideration and approaches related to [[data mining]] and big data management are used to analyze, filter, and [[Normalization (statistics)|normalize]] the [[Data set|datasets]] that are further used for making Classifications, Predictions, and Anomaly detections by [[Advances in Statistics|advanced Statistical approaches]], [[Digital image processing|Digital Image Processing]] and [[Machine learning|Machine Learning]]. The output of these processes is used by [[Astronomer|Astronomers]] and Space Scientists to study and identify patterns, anomalies, and movements in outer space and conclude theories and discoveries in the [[cosmos]].

== History ==
In 2007, the [[Galaxy Zoo|Galaxy Zoo project]]<ref>{{Cite web |title=Zooniverse |url=https://www.zooniverse.org/projects/zookeeper/galaxy-zoo |access-date=2024-05-10 |website=www.zooniverse.org}}</ref> was launched for [[Galaxy morphological classification|morphological classification]]<ref>{{Cite journal |last=Cavanagh |first=Mitchell K. |last2=Bekki |first2=Kenji |last3=Groves |first3=Brent A. |date=2021-07-08 |title=Morphological classification of galaxies with deep learning: comparing 3-way and 4-way CNNs |url=http://arxiv.org/abs/2106.01571 |journal=Monthly Notices of the Royal Astronomical Society |volume=506 |issue=1 |pages=659–676 |doi=10.1093/mnras/stab1552 |issn=0035-8711}}</ref><ref>{{Cite journal |last=Goyal |first=Lalit Mohan |last2=Arora |first2=Maanak |last3=Pandey |first3=Tushar |last4=Mittal |first4=Mamta |date=2020-12-01 |title=Morphological classification of galaxies using Conv-nets |url=https://doi.org/10.1007/s12145-020-00526-w |journal=Earth Science Informatics |language=en |volume=13 |issue=4 |pages=1427–1436 |doi=10.1007/s12145-020-00526-w |issn=1865-0481}}</ref> of a large number of [[Galaxy|galaxies]]. In this project, 900,000 images were considered for classification that were taken from the [[Sloan Digital Sky Survey|Sloan Digital Sky Survey (SDSS)]]<ref name=":0">{{Cite web |title=Sloan Digital Sky Survey-V: Pioneering Panoptic Spectroscopy - SDSS-V |url=https://www.sdss.org/ |access-date=2024-05-10 |language=en-US}}</ref> for the past 7 years. The task was to study each picture of a galaxy, classify it as [[Elliptical galaxy|elliptical]] or [[Spiral galaxy|spiral]], and determine whether it was spinning or not. The team of Astrophysicists led by [[Kevin Schawinski]] in [[University of Oxford|Oxford University]] were in charge of this project and Kevin and his colleague [[Chris Lintott|Chris Linlott]] figured out that it would take a period of 3-5 years for such a team to complete the work<ref>{{Cite web |last=Pati |first=Satavisa |date=2021-06-18 |title=How Data Science is Used in Astronomy? |url=https://www.analyticsinsight.net/data-science/how-data-science-is-used-in-astronomy |access-date=2024-05-10 |website=Analytics Insight |language=en}}</ref>. There they came up with the idea of using Machine Learning and Data Science techniques for analyzing the images and classifying them<ref>{{Citation |last=Baron |first=Dalya |title=Machine Learning in Astronomy: a practical overview |date=2019-04-15 |url=http://arxiv.org/abs/1904.07248 |access-date=2024-05-10 |doi=10.48550/arXiv.1904.07248}}</ref>.

== Methodology ==
The data retrieved from the sky surveys are first brought for [[Data preprocessing|Pre-processing]]. In this, [[Data redundancy|redundancies]] are removed and filtrated. Further, [[Feature engineering|feature extraction]] is performed on this filtered data set, which is further taken for processes<ref name=":1">{{Cite journal |last=Zhang |first=Yanxia |last2=Zhao |first2=Yongheng |date=2015-05-22 |title=Astronomy in the Big Data Era |url=http://datascience.codata.org/article/10.5334/dsj-2015-011/ |journal=Data Science Journal |volume=14 |issue=0 |pages=11 |doi=10.5334/dsj-2015-011 |issn=1683-1470}}</ref>. Some of the renowned sky surveys are listed below:

* The Palomar Digital Sky Survey (DPOSS)<ref>{{Cite web |title=The Palomar Digital Sky Survey (DPOSS) |url=https://sites.astro.caltech.edu/~george/dposs/dposs_pop.html |access-date=2024-05-10 |website=sites.astro.caltech.edu}}</ref>
* The Two-Micron All Sky Survey (2MASS)<ref>{{Cite web |title=IRSA - Two Micron All Sky Survey (2MASS) |url=https://irsa.ipac.caltech.edu/Missions/2mass.html |access-date=2024-05-10 |website=irsa.ipac.caltech.edu}}</ref>
* [[Green Bank Telescope|Green Bank Telescope (GBT)]]<ref>{{Cite web |date=2023-06-26 |title=GBT |url=https://greenbankobservatory.org/portal/gbt/ |access-date=2024-05-10 |website=Green Bank Observatory |language=en-US}}</ref>
* The Galaxy Evolution Explorer (GALEX)<ref>{{Cite web |title=GALEX - Galaxy Evolution Explorer |url=http://www.galex.caltech.edu/ |access-date=2024-05-10 |website=www.galex.caltech.edu}}</ref>
* The Sloan Digital Sky Survey (SDSS)<ref name=":0" />
* [[SkyMapper|SkyMapper Southern Sky Survey (SMSS)]]<ref>{{Cite web |title=SkyMapper Southern Sky Survey |url=https://skymapper.anu.edu.au/ |access-date=2024-05-10 |website=skymapper.anu.edu.au}}</ref>
* [[Pan-STARRS|The Panoramic Survey Telescope and Rapid Response System (PanSTARRS)]]<ref>{{Cite web |title=Pan-STARRS1 data archive home page - PS1 Public Archive - STScI Outerspace |url=https://outerspace.stsci.edu/display/PANSTARRS/ |access-date=2024-05-10 |website=outerspace.stsci.edu}}</ref>
* [[Vera C. Rubin Observatory|The Large Synoptic Survey Telescope (LSST)]]<ref>{{Cite web |last=Telescope |first=Large Synoptic Survey |title=Rubin Observatory |url=https://www.lsst.org/ |access-date=2024-05-10 |website=Rubin Observatory |language=en}}</ref>
* [[Square Kilometre Array|The Square Kilometer Array (SKA)]]<ref>{{Cite web |title=Explore {{!}} SKAO |url=https://www.skao.int/en |access-date=2024-05-10 |website=www.skao.int}}</ref>

The size of data from the above-mentioned sky surveys ranges from 3 [[Terabyte|TB]] to almost 4.6 [[Exabyte|EB]]<ref name=":1" />. Further, [[data mining]] tasks that are involved in the management and manipulation of the data involve methods like [[Statistical classification|Classification]], [[Regression analysis|Regression]], [[Cluster analysis|Clustering]], [[Anomaly detection|Anomaly Detection]], and [[Time series|Time-Series Analysis]]. Several approaches and applications for each of these methods are involved in the task accomplishments.

=== Classification ===
'''''Classification'''''<ref>{{Cite journal |last=Chowdhury |first=Shovan |last2=Schoen |first2=Marco P. |date=2020-10-02 |title=Research Paper Classification using Supervised Machine Learning Techniques |url=https://ieeexplore.ieee.org/document/9249211/ |publisher=IEEE |pages=1–6 |doi=10.1109/IETC47856.2020.9249211 |isbn=978-1-7281-4291-3}}</ref> is used for specific identifications and categorizations of astronomical data such as [[Stellar classification|Spectral classification]], Photometric classification, Morphological classification, and classification of [[Solar phenomena|solar activity]]. The approaches of classification techniques are listed below:

* [[Artificial Neural Network|Artificial Neural Networks (ANN)]]
* [[Support vector machine|Support Vector Machines (SVM)]]
* [[Learning vector quantization|Learning Vector Quantization (LVQ)]]
* [[Decision tree|Decision Trees]]
* [[Random forest|Random Forest]]
* [[K-nearest neighbors algorithm|K-Nearest Neighbors]]
* [[Naive Bayes classifier|Naïve Bayesian Networks]]
* [[Radial basis function|Radial Basis Function Network]]
* [[Gaussian process|Gaussian Process]]
* [[Decision table|Decision Table]]
* [[Alternating decision tree|Alternating Decision Tree (ADTree)]]

=== Regression ===
'''''Regression'''''<ref>{{Citation |last=Sarstedt |first=Marko |title=Regression Analysis |date=2014 |work=A Concise Guide to Market Research: The Process, Data, and Methods Using IBM SPSS Statistics |pages=193–233 |editor-last=Sarstedt |editor-first=Marko |url=https://doi.org/10.1007/978-3-642-53965-7_7 |access-date=2024-05-10 |place=Berlin, Heidelberg |publisher=Springer |language=en |doi=10.1007/978-3-642-53965-7_7 |isbn=978-3-642-53965-7 |last2=Mooi |first2=Erik |editor2-last=Mooi |editor2-first=Erik}}</ref> is used to make predictions based on the retrieved data through statistical trends and statistical modeling. Different uses of this technique are used for fetching [[Photometric redshift|Photometric redshifts]] and measurements of physical parameters of stars<ref>{{Cite journal |title=Bulletin de la Société Royale des Sciences de Liège {{!}} PoPuPS |url=https://popups.uliege.be/0037-9565/index.php |journal=Bulletin de la Société Royale des Sciences de Liège |language=fr |issn=0037-9565}}</ref>. The approaches are listed below:

* [[Artificial Neural Network|Artificial Neural Networks (ANN)]]
* [[Support vector regression|Support Vector Regression (SVR)]]
* [[Decision tree|Decision Trees]]
* [[Random forest|Random Forest]]
* [[K-nearest neighbors algorithm|K-Nearest Neighbors Regression]]
* [[Kernel regression|Kernel Regression]]
* [[Principal component regression|Principal Component Regression (PCR)]]
* [[Gaussian process|Gaussian Process]]
* [[Linear least squares|Least Squared Regression (LSR)]]
* [[Partial least squares regression|Partial Least Squares Regression]]

=== Clustering ===
'''''Clustering'''''<ref>{{Cite journal |last=Bindra |first=Kamalpreet |last2=Mishra |first2=Anuranjan |date=2017-09 |title=A detailed study of clustering algorithms |url=http://ieeexplore.ieee.org/document/8342454/ |publisher=IEEE |pages=371–376 |doi=10.1109/ICRITO.2017.8342454 |isbn=978-1-5090-3012-5}}</ref> is classifying objects based on a [[similarity measure]] metric. It is used in Astronomy for Classification as well as [[Object detection|Special/rare object detection]]. The approaches are listed below:

* [[Principal component analysis|Principal Component Analysis (PCA)]]
* [[DBSCAN|DBScan]]
* [[K-means clustering|K-Means Clustering]]
* [[OPTICS algorithm|OPTICS]]
* [[Cobweb model]]
* [[Self-organizing map|Self Organizing Map (SOM)]]
* [[Expectation–maximization algorithm|Expectation Maximization]]
* [[Hierarchical clustering|Hierarchical Clustering]]
* AutoClass<ref>{{Cite journal |last=Pizzuti |first=C. |last2=Talia |first2=D. |date=2003-05 |title=P-autoclass: scalable parallel clustering for mining large data sets |url=http://ieeexplore.ieee.org/document/1198395/ |journal=IEEE Transactions on Knowledge and Data Engineering |language=en |volume=15 |issue=3 |pages=629–641 |doi=10.1109/TKDE.2003.1198395 |issn=1041-4347}}</ref>
* [[Gaussian process|Gaussian Mixture Modeling (GMM)]]

=== Anomaly Detection ===
'''''Anomaly Detection'''''<ref>{{Cite journal |last=Thudumu |first=Srikanth |last2=Branch |first2=Philip |last3=Jin |first3=Jiong |last4=Singh |first4=Jugdutt (Jack) |date=2020-07-02 |title=A comprehensive survey of anomaly detection techniques for high dimensional big data |url=https://doi.org/10.1186/s40537-020-00320-x |journal=Journal of Big Data |volume=7 |issue=1 |pages=42 |doi=10.1186/s40537-020-00320-x |issn=2196-1115}}</ref> is used for detecting irregularities in the dataset. However, this technique is used here to detect [[Object detection|rare/special objects]]. The following approaches are used:

* [[Principal component analysis|Principal Component Analysis (PCA)]]
* [[K-means clustering|K-Means Clustering]]
* [[Expectation–maximization algorithm|Expectation Maximization]]
* [[Hierarchical clustering|Hierarchical Clustering]]
* [[Support vector machine|One-class SVM]]

=== Time-Series Analysis ===
'''''Time-Series Analysis'''''<ref>{{Cite book |url=https://onlinelibrary.wiley.com/doi/book/10.1002/0471264385 |title=Handbook of Psychology |date=2003-04-15 |publisher=Wiley |isbn=978-0-471-17669-5 |editor-last=Weiner |editor-first=Irving B. |edition=1 |language=en |doi=10.1002/0471264385.wei0223}}</ref> helps in analyzing trends and predicting outputs over time. It is used for Trend Prediction and Novel detection (detection of unknown data). The approaches used here are:

* [[Artificial Neural Network|Artificial Neural Networks (ANN)]]
* [[Support vector regression|Support Vector Regression (SVR)]]
* [[Decision tree|Decision Trees]]

Okumaya devam et...
 

Geri
Üst